Programming Language Technology, DAT151/DIT231
(Additional notes complementing the slides.)
Produce binary code or symbolic assembly.
.class file by jasmin.Translate variable names to addresses
Translate expression trees to stack instruction sequences
Translate control structures and boolean operations to jumps
Translate function calls into machine-level calls
Jasmin types:
I: intD: doubleV: voidZ: booleanvoid foo // size=0 max=0
( double x // x:0 size=2 max=2
, int y) { // y:2 size=3 max=3
int i; // i:3 size=4 max=4
{ // newBlock
double y; // y:4 size=6 max=6
bool b; // b:6 size=7 max=7
} // popBlock size=4 max=7
int j; // j:4 size=5 max=7
}.method public static foo(DI)V
.limit locals 7
.end method
Example:
int main() {
int i; // i:0
double x = 3 * (i = 42) + i++; // x:1
}Translates to JVM:
.method public static main()I
.limit stack 4
;; instr ;; expr ;; stack (S) ;; store (V)
iconst_3 ;; 3 ;; 3
bipush 42 ;; 42 ;; 3 . 42
istore_0 ;; i = ;; 3 ;; 0:42
iload_0 ;; i = 42 ;; 3 . 42
imul ;; 3 * (i = 42) ;; 126
iload_0 ;; i ;; 126 . 42
dup ;; ;; 126 . 42 . 42
iconst_1 ;; 1 ;; 126 . 42 . 42 . 1
iadd ;; i + 1 ;; 126 . 42 . 43
istore_0 ;; i++ ;; 126 . 42 ;; 0:43
iadd ;; 3 * (i = 42) + i++ ;; 168
i2d ;; (double) ;; 168.0
dstore_1 ;; x = 3 * (i = 42) + i++ ;; ;; 0:43, 1:168.0Maximum stack size: 4 (32bit words)
JVM instruction small-step semantics (without jumps):
I : ⟨V,S⟩ → ⟨V',S'⟩
I : JVM instruction (or instruction sequence)
V / V' : variable store before/after
S / S' : stack before/after
Some jump-free instructions:
| I | V | S | V' | S' |
|---|---|---|---|---|
iconst i |
V | S | V | S.i |
bipush i |
V | S | V | S.i |
iadd |
V | S.i.j | V | S.(i+j) |
imul |
V | S.i.j | V | S.(i*j) |
i2d |
V | S.i | V | S.(double)i |
dup |
V | S.i | V | S.i.i |
istore a |
V | S.i | V[a:i] | S |
dstore a |
V | S.d | V[a:d] | S |
iload a |
V | S | V | S.V[a] |
dload a |
V | S | V | S.V[a] |
Example:
bool inside(int i, int j, int k) {
return i <= j && j < k;
}Machines typically do not have boolean operations; use jumps instead.
.method public static inside(III)Z
.limit locals 3
.limit stack 2
iload_0 ;; i
iload_1 ;; j
if_icmpgt Lfalse ;; i <= j ;; false if i > j
iload_1 ;; j
iload_2 ;; k
if_icmpge Lfalse ;; j < k ;; false if j >= k
iconst_1 ;; true
goto Ldone
Lfalse:
iconst_0 ;; false
Ldone:
ireturn ;; return
.end methodJVM instruction small-step semantics (with jumps):
I : ⟨P,V,S⟩ → ⟨P',V',S'⟩
P / P' : code position (program counter) before/after
| I | P | V | S | P' | V' | S' | Condition |
|---|---|---|---|---|---|---|---|
goto L |
P | V | S | L | V | S | |
ifeq L |
P | V | S.i | L | V | S | i = 0 |
ifeq L |
P | V | S.i | P+1 | V | S | i ≠ 0 |
ifne L |
P | V | S.i | L | V | S | i ≠ 0 |
ifne L |
P | V | S.i | P+1 | V | S | i = 0 |
if_icmpgt L |
P | V | S.i.j | L | V | S | i > j |
if_icmpgt L |
P | V | S.i.j | P+1 | V | S | i ≤ j |
Signature (global symbol table):
Context (local symbol table):
newLocal)lookupVar)newBlock,
popBlock)State
emit)The emit function:
emit!iconst if ≥ -1 and ≤ 5 (takes 1 byte)bipush if ≥ -128 and ≤ 127 (takes 2 bytes)ldc otherwise (takes ~ 5 bytes)We specify the compiler by so-called compilation schemes (pseudo-code describing syntax-directed traversal).
An expression of type t is compiled to instructions
whose effect is to leave a new value of type t on top of
the stack, and leave the rest of the stack unchanged.
compileExp(EInt i):
emit(iconst i)
compileExp(EAdd t e₁ e₂):
compileExp(e₁)
compileExp(e₂)
emit(t-add) -- either iadd or dadd
compileExp(EVar x):
a <- lookupVar x
emit(t-load a) -- either iload or dload
compileExp(ECall x es)
for (e ∈ es): -- compile function arguments
compileExp(e)
f <- lookupFun(x) -- get the Jasmin name of function x
emit(invokestatic f) -- emit function call
compileExp(EAss x e):
a <- lookupVar x
compileExp(e)
emit(t-store a) -- either istore or dstore
-- Problem hereJVM Small-step semantics without jumps:
i : ⟨V,S⟩ → ⟨V',S'⟩
i : JVM instruction (or instruction sequence)
V / V' : variable store before/after
S / S' : stack before/after
We say γ ~ V if environment γ translates to
variable store V.
Correctness statement (simplified):
If γ ⊢ e ⇓ ⟨v, γ'⟩ and γ ~ V then compileExp(e) : ⟨V,S⟩ →* ⟨V',S.v⟩ such that γ' ~ V'.
Correct translation of assignment:
compileExp(EAss x e):
a <- lookupVar x
compileExp(e)
emit(t-store a)
emit(t-load a)A statement (sequence) is compiled to instructions whose effect on the stack is nil (no change).
compileStm(SInit t x e):
a <- newLocal t x
compileExp(e)
emit(t-store a) -- either istore or dstore
compileStm(SExp t e):
compileExp(e)
emit(t-pop) -- either pop or pop2
compileStm(SBlock ss):
newBlock
for (s : ss)
compileStm(s)
popBlock
compileStm(SWhile e s): -- s is a SBlock
LStart, LEnd <- newLabel
emit(LStart:)
compileExp(e) -- executing e leaves boolean on stack
emit(ifeq LEnd) -- if "false" end loop
compileStm(s) -- body of loop
emit(goto LStart) -- recheck loop condition
emit(LEnd:)Correctness statement (simplified):
If γ ⊢ s ⇓ γ' and γ ~ V then compileStm(s) : ⟨V,S⟩ →* ⟨V',S⟩ such that γ' ~ V'.
(We omit emit in the following.)
The simplest compiler will compile boolean expressions to
instructions that leave either 0 (for false)
or 1 (for true) on the stack.
However, this leads to convoluted translation of control-flow statements
like if and while.
Example:
while (i < j) {...}becomes
L0: ;; beginning of loop, check condition
iload_1 ;; i
iload_0 ;; j
if_icmplt L2 ;; i < j ?
iconst_0 ;; no: put false
goto L3
L2: ;; yes: put true
iconst_1
L3: ;; boolean value of i < j is on top of the stack
ifeq L1 ;; if top of stack is 0, exit while loop
...
goto L0 ;; repeat loop
L1: ;; end of loopWe would like to skip the computation of the boolean value, but jump directly according to the condition:
L0: ;; beginning of loop, check condition
iload_1 ;; i
iload_0 ;; j
if_icmpge L1 ;; exit loop if i ≥ j
...
goto L0 ;; repeat loop
L1: ;; end of loopWe can get close to this in a compositional way by compiling boolean
expressions as jumps to either label Ltrue (if the
expression will get value true) or label
Lfalse (otherwise).
compileBool (Exp e, Label Ltrue, Label Lfalse)
For compiling control-flow, we use it as follows:
compileStm(SWhile e s):
Lstart, Ltrue, Lfalse ← newLabel
Lstart:
compileBool (e, Ltrue, Lfalse)
Ltrue:
compileStm(s)
goto Lstart
Lfalse:
compileStm(SIfElse e s₁ s₂):
... compileBool is given by the following compilation
schemes:
comparison operators:
compileBool(ELt int e₁ e₂, Ltrue, Lfalse):
compileExp(e₁)
compileExp(e₂)
if_icmplt Ltrue
goto Lfalse
compileBool(EGt t e₁ e₂, Ltrue, Lfalse):
... logical operators:
compileBool(ETrue, Ltrue, Lfalse):
goto LTrue
compileBool(EFalse, Ltrue, Lfalse):
goto LFalse
compileBool(EAnd e₁ e₂, Ltrue, Lfalse):
Ltrue' ← newLabel
compileBool(e₁, Ltrue', Lfalse)
Ltrue':
compileBool(e₂, Ltrue, Lfalse)
compileBool(EOr e₁ e₂, Ltrue, Lfalse):
... Sometimes booleans need to be represented by 0 and 1,
e.g. when assigning to a boolean variable:
compileExp(e) | typeOf(e) == bool:
Ltrue, Lfalse, Ldone <- newLabel
compileBool(e, Ltrue, Lfalse)
Ltrue:
iconst_1
goto Ldone
Lfalse:
pop
iconst_0
Ldone: The compileBool(e, Ltrue, Lfalse) function does not
utilize fall-through in many cases, and thus uses more labels than
strictly needed.
An alternative is a compileBool(e, Lfalse) that produces
code that would jump to Lfalse whenever e
evaluates to false, and fall through otherwise.
compileBool(EFalse, l):
goto l
compileBool(ETrue, l):
-- emit nothing
compileBool(EAnd e₁ e₂, l):
compileBool(e₁, l)
compileBool(e₂, l)
-- if we had negation:
compileBool(ENot e, l):
lTrue ← newLabel
compileBool(e, lTrue)
goto l
lTrue:
compileBool(EOr e₁ e₂, l):
lFalse, lTrue <- newLabel
compileBool(e₁, lFalse)
goto lTrue
lFalse:
compileBool(e₂, l)
lTrue:
compileBool(ELt int e₁ e₂, l):
compileExp(e₁)
compileExp(e₂)
if_icmpge l