Compiling C-- to JVM

Programming Language Technology, DAT151/DIT231

(Additional notes complementing the slides.)

Tasks of the compiler

  1. Produce binary code or symbolic assembly.

  2. Translate variable names to addresses

  3. Translate expression trees to stack instruction sequences

  4. Translate control structures and boolean operations to jumps

  5. Translate function calls into machine-level calls

Ad 1. Variable names to addresses

Jasmin types:

void foo                     //          size=0  max=0
  ( double x                 // x:0      size=2  max=2
  , int y) {                 // y:2      size=3  max=3
  int i;                     // i:3      size=4  max=4
  {                          // newBlock
     double y;               // y:4      size=6  max=6
     bool b;                 // b:6      size=7  max=7
  }                          // popBlock size=4  max=7
  int j;                     // j:4      size=5  max=7
}
.method public static foo(DI)V
.limit locals 7
.end method

Ad 2. Expressions trees to stack instructions

Example:

int main() {
  int i;                         // i:0
  double x = 3 * (i = 42) + i++; // x:1
}

Translates to JVM:

.method public static main()I
.limit stack 4

;; instr   ;; expr                     ;; stack (S)   ;; store (V)
iconst_3   ;; 3                        ;; 3
bipush 42  ;; 42                       ;; 3 . 42
istore_0   ;; i =                      ;; 3           ;; 0:42
iload_0    ;; i = 42                   ;; 3 . 42
imul       ;; 3 * (i = 42)             ;; 126
iload_0    ;; i                        ;; 126 . 42
dup        ;;                          ;; 126 . 42 . 42
iconst_1   ;; 1                        ;; 126 . 42 . 42 . 1
iadd       ;; i + 1                    ;; 126 . 42 . 43
istore_0   ;; i++                      ;; 126 . 42    ;; 0:43
iadd       ;; 3 * (i = 42) + i++       ;; 168
i2d        ;; (double)                 ;; 168.0
dstore_1   ;; x = 3 * (i = 42) + i++   ;;             ;; 0:43, 1:168.0

Maximum stack size: 4 (32bit words)

JVM instruction small-step semantics (without jumps):

    I : ⟨V,S⟩ → ⟨V',S'⟩

    I      : JVM instruction (or instruction sequence)
    V / V' : variable store before/after
    S / S' : stack before/after

Some jump-free instructions:

I V S V' S'
iconst i V S V S.i
bipush i V S V S.i
iadd V S.i.j V S.(i+j)
imul V S.i.j V S.(i*j)
i2d V S.i V S.(double)i
dup V S.i V S.i.i
istore a V S.i V[a:i] S
dstore a V S.d V[a:d] S
iload a V S V S.V[a]
dload a V S V S.V[a]

Ad 3. Booleans and jumps

Example:

bool inside(int i, int j, int k) {
  return i <= j && j < k;  
}

Machines typically do not have boolean operations; use jumps instead.

.method public static inside(III)Z
.limit locals 3
.limit stack 2

    iload_0             ;; i
    iload_1             ;; j
    if_icmpgt Lfalse    ;; i <= j  ;; false if i > j
    iload_1             ;; j
    iload_2             ;; k
    if_icmpge Lfalse    ;; j < k   ;; false if j >= k

    iconst_1            ;; true
    goto      Ldone

Lfalse:  
    iconst_0            ;; false

Ldone:  
    ireturn             ;; return

.end method

JVM instruction small-step semantics (with jumps):

    I : ⟨P,V,S⟩ → ⟨P',V',S'⟩

    P / P' : code position (program counter) before/after
I P V S P' V' S' Condition
goto L P V S L V S
ifeq L P V S.i L V S i = 0
ifeq L P V S.i P+1 V S i ≠ 0
ifne L P V S.i L V S i ≠ 0
ifne L P V S.i P+1 V S i = 0
if_icmpgt L P V S.i.j L V S i > j
if_icmpgt L P V S.i.j P+1 V S i ≤ j

Infrastructure of the compiler

Signature (global symbol table):

Context (local symbol table):

  1. allocate local variables (newLocal)
  2. resolve variable names to addresses (lookupVar)
  3. free variables when exiting a block (newBlock, popBlock)

State

  1. generate new labels
  2. manage local variable context
  3. keep track of maximum stack use
  4. append generated code (emit)

The emit function:

We specify the compiler by so-called compilation schemes (pseudo-code describing syntax-directed traversal).

Compiling expressions

An expression of type t is compiled to instructions whose effect is to leave a new value of type t on top of the stack, and leave the rest of the stack unchanged.

  compileExp(EInt i):  
    emit(iconst i)

  compileExp(EAdd t e₁ e₂):  
    compileExp(e₁)
    compileExp(e₂)
    emit(t-add)            -- either iadd or dadd

  compileExp(EVar x):  
    a <- lookupVar x
    emit(t-load a)         -- either iload or dload

  compileExp(ECall x es)
    for (e ∈ es):          -- compile function arguments
      compileExp(e)
    f <- lookupFun(x)      -- get the Jasmin name of function x
    emit(invokestatic f)   -- emit function call

  compileExp(EAss x e):  
    a <- lookupVar x
    compileExp(e)
    emit(t-store a)        -- either istore or dstore
    -- Problem here

Compiler correctness

JVM Small-step semantics without jumps:

    i : ⟨V,S⟩ → ⟨V',S'⟩

    i      : JVM instruction (or instruction sequence)
    V / V' : variable store before/after
    S / S' : stack before/after

We say γ ~ V if environment γ translates to variable store V.

Correctness statement (simplified):

If γ ⊢ e ⇓ ⟨v, γ'⟩ and γ ~ V then compileExp(e) : ⟨V,S⟩ →* ⟨V',S.v⟩ such that γ' ~ V'.

Correct translation of assignment:

  compileExp(EAss x e):  
    a <- lookupVar x
    compileExp(e)
    emit(t-store a)
    emit(t-load a)

Compiling statements

A statement (sequence) is compiled to instructions whose effect on the stack is nil (no change).

  compileStm(SInit t x e):  
    a <- newLocal t x
    compileExp(e)
    emit(t-store a)              -- either istore or dstore

  compileStm(SExp t e):  
    compileExp(e)
    emit(t-pop)                  -- either pop or pop2

  compileStm(SBlock ss):  
    newBlock
    for (s : ss)
      compileStm(s)
    popBlock

  compileStm(SWhile e s):        -- s is a SBlock
    LStart, LEnd <- newLabel

    emit(LStart:)

    compileExp(e)                -- executing e leaves boolean on stack
    emit(ifeq LEnd)              -- if "false" end loop

    compileStm(s)                -- body of loop
    emit(goto LStart)            -- recheck loop condition

    emit(LEnd:)

Correctness statement (simplified):

If γ ⊢ s ⇓ γ' and γ ~ V then compileStm(s) : ⟨V,S⟩ →* ⟨V',S⟩ such that γ' ~ V'.

(We omit emit in the following.)

Compiling booleans

The simplest compiler will compile boolean expressions to instructions that leave either 0 (for false) or 1 (for true) on the stack.
However, this leads to convoluted translation of control-flow statements like if and while.

Example:

  while (i < j) {...}

becomes

  L0:            ;; beginning of loop, check condition
  iload_1        ;; i
  iload_0        ;; j
  if_icmplt L2   ;; i < j ?  
  iconst_0       ;; no: put false
  goto L3
  L2:            ;; yes: put true
  iconst_1
  L3:            ;; boolean value of i < j is on top of the stack
  ifeq L1        ;; if top of stack is 0, exit while loop
  ...  
  goto L0        ;; repeat loop
  L1:            ;; end of loop

We would like to skip the computation of the boolean value, but jump directly according to the condition:

  L0:            ;; beginning of loop, check condition
  iload_1        ;; i
  iload_0        ;; j
  if_icmpge L1   ;; exit loop if i ≥ j
  ...  
  goto L0        ;; repeat loop
  L1:            ;; end of loop

We can get close to this in a compositional way by compiling boolean expressions as jumps to either label Ltrue (if the expression will get value true) or label Lfalse (otherwise).

compileBool (Exp e, Label Ltrue, Label Lfalse)

For compiling control-flow, we use it as follows:

compileStm(SWhile e s):  
  Lstart, Ltrue, Lfalse ← newLabel
  Lstart:  
  compileBool (e, Ltrue, Lfalse)
  Ltrue:  
  compileStm(s)
  goto Lstart
  Lfalse:  

compileStm(SIfElse e s₁ s₂):  
  ...  

compileBool is given by the following compilation schemes:

Sometimes booleans need to be represented by 0 and 1,
e.g. when assigning to a boolean variable:

    compileExp(e) | typeOf(e) == bool:  
      Ltrue, Lfalse, Ldone <- newLabel

      compileBool(e, Ltrue, Lfalse)
      Ltrue:  
      iconst_1
      goto Ldone

      Lfalse:  
      pop
      iconst_0

      Ldone:  

Compiling booleans with fall-through

The compileBool(e, Ltrue, Lfalse) function does not utilize fall-through in many cases, and thus uses more labels than strictly needed.

An alternative is a compileBool(e, Lfalse) that produces code that would jump to Lfalse whenever e evaluates to false, and fall through otherwise.

compileBool(EFalse, l):  
  goto l

compileBool(ETrue, l):  
  -- emit nothing

compileBool(EAnd e₁ e₂, l):  
  compileBool(e₁, l)
  compileBool(e₂, l)

-- if we had negation:  
compileBool(ENot e, l):  
  lTrue ← newLabel
  compileBool(e, lTrue)
  goto l
  lTrue:  

compileBool(EOr e₁ e₂, l):  
  lFalse, lTrue <- newLabel
  compileBool(e₁, lFalse)
  goto lTrue
  lFalse:  
  compileBool(e₂, l)
  lTrue:  

compileBool(ELt int e₁ e₂, l):  
  compileExp(e₁)
  compileExp(e₂)
  if_icmpge l