Programming Language Technology, DAT151/DIT231
(Additional notes complementing the slides.)
Produce binary code or symbolic assembly.
.class
file by jasmin
.Translate variable names to addresses
Translate expression trees to stack instruction sequences
Translate control structures and boolean operations to jumps
Translate function calls into machine-level calls
Jasmin types:
I
: int
D
: double
V
: void
Z
: boolean
void foo // size=0 max=0
( double x // x:0 size=2 max=2
, int y) { // y:2 size=3 max=3
int i; // i:3 size=4 max=4
{ // newBlock
double y; // y:4 size=6 max=6
bool b; // b:6 size=7 max=7
} // popBlock size=4 max=7
int j; // j:4 size=5 max=7
}
.method public static foo(DI)V
.limit locals 7
.end method
Example:
int main() {
int i; // i:0
double x = 3 * (i = 42) + i++; // x:1
}
Translates to JVM:
.method public static main()I4
.limit stack
;; instr ;; expr ;; stack (S) ;; store (V)
;; 3 ;; 3
iconst_3 42 ;; 42 ;; 3 . 42
bipush ;; i = ;; 3 ;; 0:42
istore_0 ;; i = 42 ;; 3 . 42
iload_0 ;; 3 * (i = 42) ;; 126
imul ;; i ;; 126 . 42
iload_0 ;; ;; 126 . 42 . 42
dup ;; 1 ;; 126 . 42 . 42 . 1
iconst_1 ;; i + 1 ;; 126 . 42 . 43
iadd ;; i++ ;; 126 . 42 ;; 0:43
istore_0 ;; 3 * (i = 42) + i++ ;; 168
iadd ;; (double) ;; 168.0
i2d ;; x = 3 * (i = 42) + i++ ;; ;; 0:43, 1:168.0 dstore_1
Maximum stack size: 4 (32bit words)
JVM instruction small-step semantics (without jumps):
I : ⟨V,S⟩ → ⟨V',S'⟩
I : JVM instruction (or instruction sequence)
V / V' : variable store before/after
S / S' : stack before/after
Some jump-free instructions:
I | V | S | V' | S' |
---|---|---|---|---|
iconst i |
V | S | V | S.i |
bipush i |
V | S | V | S.i |
iadd |
V | S.i.j | V | S.(i+j) |
imul |
V | S.i.j | V | S.(i*j) |
i2d |
V | S.i | V | S.(double)i |
dup |
V | S.i | V | S.i.i |
istore a |
V | S.i | V[a:i] | S |
dstore a |
V | S.d | V[a:d] | S |
iload a |
V | S | V | S.V[a] |
dload a |
V | S | V | S.V[a] |
Example:
bool inside(int i, int j, int k) {
return i <= j && j < k;
}
Machines typically do not have boolean operations; use jumps instead.
.method public static inside(III)Z3
.limit locals 2
.limit stack
;; i
iload_0 ;; j
iload_1 ;; i <= j ;; false if i > j
if_icmpgt Lfalse ;; j
iload_1 ;; k
iload_2 ;; j < k ;; false if j >= k
if_icmpge Lfalse
;; true
iconst_1
goto Ldone
Lfalse: ;; false
iconst_0
Ldone: ;; return
ireturn
.end method
JVM instruction small-step semantics (with jumps):
I : ⟨P,V,S⟩ → ⟨P',V',S'⟩
P / P' : code position (program counter) before/after
I | P | V | S | P' | V' | S' | Condition |
---|---|---|---|---|---|---|---|
goto L |
P | V | S | L | V | S | |
ifeq L |
P | V | S.i | L | V | S | i = 0 |
ifeq L |
P | V | S.i | P+1 | V | S | i ≠ 0 |
ifne L |
P | V | S.i | L | V | S | i ≠ 0 |
ifne L |
P | V | S.i | P+1 | V | S | i = 0 |
if_icmpgt L |
P | V | S.i.j | L | V | S | i > j |
if_icmpgt L |
P | V | S.i.j | P+1 | V | S | i ≤ j |
Signature (global symbol table):
Context (local symbol table):
newLocal
)lookupVar
)newBlock
,
popBlock
)State
emit
)The emit
function:
emit
!iconst
if ≥ -1 and ≤ 5 (takes 1 byte)bipush
if ≥ -128 and ≤ 127 (takes 2 bytes)ldc
otherwise (takes ~ 5 bytes)We specify the compiler by so-called compilation schemes (pseudo-code describing syntax-directed traversal).
An expression of type t
is compiled to instructions
whose effect is to leave a new value of type t
on top of
the stack, and leave the rest of the stack unchanged.
EInt i):
compileExp(
emit(iconst i)
EAdd t e₁ e₂):
compileExp(
compileExp(e₁)
compileExp(e₂)-add) -- either iadd or dadd
emit(t
EVar x):
compileExp(<- lookupVar x
a -load a) -- either iload or dload
emit(t
ECall x es)
compileExp(: -- compile function arguments
for (e ∈ es)
compileExp(e)<- lookupFun(x) -- get the Jasmin name of function x
f -- emit function call
emit(invokestatic f)
EAss x e):
compileExp(<- lookupVar x
a
compileExp(e)-store a) -- either istore or dstore
emit(t-- Problem here
JVM Small-step semantics without jumps:
i : ⟨V,S⟩ → ⟨V',S'⟩
i : JVM instruction (or instruction sequence)
V / V' : variable store before/after
S / S' : stack before/after
We say γ ~ V
if environment γ
translates to
variable store V
.
Correctness statement (simplified):
If γ ⊢ e ⇓ ⟨v, γ'⟩ and γ ~ V then compileExp(e) : ⟨V,S⟩ →* ⟨V',S.v⟩ such that γ' ~ V'.
Correct translation of assignment:
EAss x e):
compileExp(<- lookupVar x
a
compileExp(e)-store a)
emit(t-load a) emit(t
A statement (sequence) is compiled to instructions whose effect on the stack is nil (no change).
SInit t x e):
compileStm(<- newLocal t x
a
compileExp(e)-store a) -- either istore or dstore
emit(t
SExp t e):
compileStm(
compileExp(e)-pop) -- either pop or pop2
emit(t
SBlock ss):
compileStm(
newBlock: ss)
for (s
compileStm(s)
popBlock
SWhile e s): -- s is a SBlock
compileStm(LStart, LEnd <- newLabel
LStart:)
emit(
-- executing e leaves boolean on stack
compileExp(e) LEnd) -- if "false" end loop
emit(ifeq
-- body of loop
compileStm(s) LStart) -- recheck loop condition
emit(goto
LEnd:) emit(
Correctness statement (simplified):
If γ ⊢ s ⇓ γ' and γ ~ V then compileStm(s) : ⟨V,S⟩ →* ⟨V',S⟩ such that γ' ~ V'.
(We omit emit
in the following.)
The simplest compiler will compile boolean expressions to
instructions that leave either 0
(for false
)
or 1
(for true
) on the stack.
However, this leads to convoluted translation of control-flow statements
like if
and while
.
Example:
while (i < j) {...}
becomes
;; beginning of loop, check condition
L0: ;; i
iload_1 ;; j
iload_0 ;; i < j ?
if_icmplt L2 ;; no: put false
iconst_0
goto L3;; yes: put true
L2:
iconst_1;; boolean value of i < j is on top of the stack
L3: ;; if top of stack is 0, exit while loop
ifeq L1 ...
;; repeat loop
goto L0 ;; end of loop L1:
We would like to skip the computation of the boolean value, but jump directly according to the condition:
;; beginning of loop, check condition
L0: ;; i
iload_1 ;; j
iload_0 ;; exit loop if i ≥ j
if_icmpge L1 ...
;; repeat loop
goto L0 ;; end of loop L1:
We can get close to this in a compositional way by compiling boolean
expressions as jumps to either label Ltrue
(if the
expression will get value true
) or label
Lfalse
(otherwise).
compileBool (Exp e, Label Ltrue, Label Lfalse)
For compiling control-flow, we use it as follows:
SWhile e s):
compileStm(Lstart, Ltrue, Lfalse ← newLabel
Lstart:
Ltrue, Lfalse)
compileBool (e, Ltrue:
compileStm(s)Lstart
goto Lfalse:
SIfElse e s₁ s₂):
compileStm(...
compileBool
is given by the following compilation
schemes:
comparison operators:
ELt int e₁ e₂, Ltrue, Lfalse):
compileBool(
compileExp(e₁)
compileExp(e₂)Ltrue
if_icmplt Lfalse
goto
EGt t e₁ e₂, Ltrue, Lfalse):
compileBool(...
logical operators:
ETrue, Ltrue, Lfalse):
compileBool(LTrue
goto
EFalse, Ltrue, Lfalse):
compileBool(LFalse
goto
EAnd e₁ e₂, Ltrue, Lfalse):
compileBool(Ltrue' ← newLabel
Ltrue', Lfalse)
compileBool(e₁, Ltrue':
Ltrue, Lfalse)
compileBool(e₂,
EOr e₁ e₂, Ltrue, Lfalse):
compileBool(...
Sometimes booleans need to be represented by 0 and 1,
e.g. when assigning to a boolean variable:
| typeOf(e) == bool:
compileExp(e) Ltrue, Lfalse, Ldone <- newLabel
Ltrue, Lfalse)
compileBool(e, Ltrue:
iconst_1Ldone
goto
Lfalse:
pop
iconst_0
Ldone:
The compileBool(e, Ltrue, Lfalse)
function does not
utilize fall-through in many cases, and thus uses more labels than
strictly needed.
An alternative is a compileBool(e, Lfalse)
that produces
code that would jump to Lfalse
whenever e
evaluates to false
, and fall through otherwise.
EFalse, l):
compileBool(
goto l
ETrue, l):
compileBool(-- emit nothing
EAnd e₁ e₂, l):
compileBool(
compileBool(e₁, l)
compileBool(e₂, l)
-- if we had negation:
ENot e, l):
compileBool(
lTrue ← newLabel
compileBool(e, lTrue)
goto l:
lTrue
EOr e₁ e₂, l):
compileBool(<- newLabel
lFalse, lTrue
compileBool(e₁, lFalse)
goto lTrue:
lFalse
compileBool(e₂, l):
lTrue
ELt int e₁ e₂, l):
compileBool(
compileExp(e₁)
compileExp(e₂) if_icmpge l