Programming Language Technology, DAT151/DIT231
Why an interpreter if we can have a compiler?
I
for new language X
in
existing language Y
.C
for X
in
X
.I
) C
on C
to get a
compiled compiler C'
.O
for X
in
X
.C'
on O
to get an a compiled
optimizing compiler Cₒ
.Cₒ
on Cₒ
to get an optimized
optimizing compiler Cₒ'
.An interpreter should be compositional!
⟦op (e₁, e₂)⟧ = ⟦op⟧ (⟦e₁⟧, ⟦e₂⟧)
eval (Op(e₁,e₂)): v₁ ← eval e₁ v₂ ← eval e₂ v ← ...combine v₁ and v₂ according to Op... return v
An interpreter can be specified by:
⟦_⟧ : Exp → Val
(set
theory, domain theory)._⇓_ ⊆ Exp × Val
(big-step operational
semantics)._→_ ⊆ Exp × Exp
(small-step
operational semantics).We interpret type-checked programs only, since the meaning of overloaded operators depends on their type.
(To avoid clutter, we reuse the names of the untyped ASTs, e.g.
Exp
instead ExpT
etc.)
In analogy to type-checking Γ ⊢ e : t
we have
judgement
γ ⊢ e ⇓ v
"In environment
γ
, expressione
has valuev
."
Environments γ
are similar to contexts Γ
,
but instead of mapping variables x
to types t
,
they map variables to values v
.
An environment γ
is a stack of blocks δ
,
which are finite maps from variables x
to values
v
.
Values are integer, floating-point, and boolean literals, or a
special value null
for being undefined.
This would be an LBNF grammar for our values:
VInt. Val ::= Integer;
VDouble. Val ::= Double;
VBool. Val ::= Bool;
VNull. Val ::= "null";
BTrue. Bool ::= "true";
BFalse. Bool ::= "false";
(It is possible but not necessary to use BNFC to generate the value representation.)
Variable rule.
---------- γ(x) = v γ ⊢ x ⇓ v
Note that we take the value of the "topmost" x
---it may
appear in several blocks δ
of γ
.
{ double z = 3.14; // (z=3.14)
int y = 1; // (z=3.14,y=1)
{ int x = 0; // (z=3.14,y=1).(x=0)
int z = x + y; // (z=3.14,y=1).(x=0,z=1)
(z); // 1
printInt}
(z); // 3.14
printDouble}
The meaning of an arithmetic operator depends on its static type.
γ ⊢ e₁ ⇓ v₁ γ ⊢ e₂ ⇓ v₂ --------------------------- v = divide(t,v₁,v₂) γ ⊢ e₁ /ₜ e₂ ⇓ v
divide(int,v₁,v₂)
is integer division of the integer
literals v₁
by v₂
.divide(double,v₁,v₂)
is floating-point division of the
floating-point literals v₁
by v₂
. Integer
literals will be converted to floating-point first.divide(bool,v₁,v₂)
is undefined.Judgement γ ⊢ e ⇓ v
should be read as a function with
inputs γ
and e
and output v
.
eval(Env γ, Exp e): Val
eval(γ, EId x) = lookup(γ,x)
eval(γ, EInt i) = VInt i
eval(γ, EDouble d) = VDouble d
eval(γ, EDiv t e₁ e₂) = divide(t, eval(γ,e₁), eval(γ,e₂))
In the last clause, eval(γ,e₁)
and
eval(γ,e₂)
can be run in any order, even in parallel!
Type soundness (weak correctness):
If
e : t
ande ⇓ v
thenv : t
.
"If expression e
has type t
and
e
evaluates to value v
then v
also has type t
."
Termination (strong correctness):
If
e : t
thene ⇓ v
for somev : t
.
Allowing non-termination:
If
e : t
then either evaluation ofe
diverges, ore ⇓ v
withv : t
.
Quiz:
Which of these hold in exceptional cases?
Is this definition type-sound?
eval(γ, EDiv double e₁ e₂) = VDouble 0.0
Expression forms like increment (++x
and
x++
) and decrement and assignment x = e
in
general change the values of variables.
This is called a (side) effect.
(In contrast, the type of variables never changes. Typing has
no effects.)
We need to update the environment.
We return the updated environment along with the value:
γ ⊢ e ⇓ ⟨v; γ'⟩
"In environment
γ
, expressione
has valuev
and updates environment toγ'
."
We write γ[x=v]
for updating the topmost
(innermost) occurrence of x
in environment γ'
to value v
.
(Note x
may appear in several blocks δ
of
γ
.)
Rules.
-------------- γ(x) = v γ ⊢ x ⇓ ⟨v; γ⟩ -------------------------------- γ(x) = i γ ⊢ ++(int)x ⇓ ⟨i+1; γ[x = i+1]⟩ γ ⊢ e ⇓ ⟨v; γ'⟩ -------------------------- γ ⊢ (x = e) ⇓ ⟨v; γ'[x=v]⟩ γ ⊢ e₁ ⇓ ⟨v₁;γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨v₂;γ₂⟩ ------------------------------------- v = divide(t,v₁,v₂) γ ⊢ e₁ /ₜ e₂ ⇓ ⟨v;γ₂⟩
The last rule shows: we need to thread the environment through the judgements.
eval(Env γ, Exp e): Val × Env
eval(γ, EId x) =
⟨ lookup(γ,x), γ ⟩
eval(γ, EAssign x e) = let
⟨v,γ'⟩ = eval(γ,e)
in ⟨ v, update(γ',x,v) ⟩
eval(γ, EDiv t e₁ e₂) = let
⟨v₁,γ₁⟩ = eval(γ, e₁)
⟨v₂,γ₂⟩ = eval(γ₁,e₂)
in ⟨ divide(t,v₁,v₂), γ₂ ⟩
The evaluation order matters now!
Consider
x=0 ⊢ x + x++
vs.
x=0 ⊢ x++ + x.
E.g. in Java, we can use an environment env
global to
the interpreter and mutate it with update.
eval(Exp e): Val
eval(EId x):
return env.lookup(x)
eval(EAssign x e):
v ← eval(e)
env.update(x,v)
return v
eval(EDiv t e₁ e₂):
v₁ ← eval(e₁)
v₂ ← eval(e₂)
return divide(t,v₁,v₂)
This keeps the environment implicit.
(But careful with function calls, see below!)
In Haskell, we can use the state monad for the same purpose.
import Control.Monad.State (State, get, modify, evalState)
: Exp → State Env Val
eval
EId x) = do
eval (
γ ← getreturn (lookupVar γ x)
EAssign x e) = do
eval (
v ← eval e
modify (λ γ → updateVar γ x v)return v
EDiv TInt e₁ e₂) = do
eval (VInt i₁ ← eval e₁
VInt i₂ ← eval e₂
return (VInt (div i₁ i₂))
: Exp → Val
interpret = evalState (eval e) emptyEnv interpret e
The Haskell state monad is just sugar. It is implemented roughly by:
type State s a = s → (a, s)
get :: State s s -- s → (s, s)
= (s, s)
get s
modify :: (s → s) → State s () -- s → ((), s)
= ((), f s)
modify f s
return :: a → State s a -- s → (a, s)
return a s = (a, s)
do x ← p; q(x)) s = let
(= p s
(a, s₁) = q(a) s₁
(b, s₂) in (b, s₂)
We need to start with good environments
γ : Γ
.
This is defined pointwise: γ(x) : Γ(x)
for all
x
in scope.
Type soundness (weak correctness):
If
Γ ⊢ e : t
andγ : Γ
andγ ⊢ e ⇓ ⟨v; γ'⟩
thenv : t
andγ' : Γ
.
"If expression e
has type t
in context
Γ
and γ
is an environment matching
Γ
and e
evaluates to value v
and
γ'
then v
also has type t
and
γ'
also matches Γ
."
Termination (strong correctness):
If
Γ ⊢ e : t
andγ : Γ
thenγ ⊢ e ⇓ ⟨v; γ'⟩
for somev : t
andγ' : Γ
.
Allowing non-termination:
... (Exercise)
Input and output:
---------------------- read i from stdin γ ⊢ readInt() ⇓ ⟨i; γ⟩ γ ⊢ e ⇓ ⟨i; γ'⟩ ---------------------------- print i on stdout γ ⊢ printInt(e) ⇓ ⟨null; γ'⟩
We do not model input and output mathematically here.
Note that read i from stdin
can also abort with an
exception.
Bonus exercise: Could we model evaluation with input and output as a
mathematical function?
Hint:
return
statement
below.)Statements do not have a value. They just have effects.
γ ⊢ s ⇓ γ'
γ ⊢ ss ⇓ γ'
"In environment
γ
, statements
(sequencess
) executes successfully, returning updated environmentγ'
."
Sequencing γ ⊢ ss ⇓ γ'
:
γ ⊢ ε ⇓ γ γ ⊢ s ⇓ γ₁ γ₁ ⊢ ss ⇓ γ₂ --------------------------- γ ⊢ s ss ⇓ γ₂
An empty sequence is interpreted as the identity function
id : Env → Env
,
sequence composition as function composition.
Blocks.
γ. ⊢ ss ⇓ γ'.δ -------------- γ ⊢ {ss} ⇓ γ'
Variable declaration.
γ.δ[x=null] ⊢ e ⇓ ⟨v, γ'.δ'⟩ ---------------------------- γ.δ ⊢ t x = e; ⇓ γ'.δ'[x=v]
While.
γ ⊢ e ⇓ ⟨false; γ'⟩ -------------------- γ ⊢ while (e) s ⇓ γ' γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁. ⊢ s ⇓ γ₂.δ γ₂ ⊢ while e (s) ⇓ γ₃ ------------------------------------------------------------- γ ⊢ while (e) s ⇓ γ₃
Or:
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁ ⊢ { s } while e (s) ⇓ γ₃ ------------------------------------------------- γ ⊢ while (e) s ⇓ γ₃
Return?
γ ⊢ e ⇓ ⟨v, γ'⟩ ------------------- γ ⊢ return e; ⇓ ??
Return alters the control flow:
We can exit from the middle of a loop, block etc.!
bool prime (int p) {
if (p <= 2) return p == 2;
if (divides(2,p)) return false;
{
int q = 3;
while (q * q <= p) {
if (divides(q,p)) return false;
= q + 2;
q }
}
return true;
}
Similar to return
: break
,
continue
.
return
can be modelled as an exception that
carries the return value v
as exception information.
When calling a function, we handle the exception, treating it as regular
return from the function.
γ ⊢ s ⇓ r
wherer ::= v | γ'
"In environment
γ
statements
executes successfully,
either asking to return valuev
,
or to continue execution in updated environmentγ'
."
The disjoint union Val ⊎ Env
could be modeled in LBNF
as:
Return. Result ::= "return" Val;
Continue. Result ::= "continue" Env;
Return.
γ ⊢ e ⇓ ⟨v, γ'⟩ ------------------ γ ⊢ return e; ⇓ v
Statement as expression.
γ ⊢ e ⇓ ⟨v, γ'⟩ --------------- γ ⊢ e; ⇓ γ'
Sequence: Propagate exception.
γ ⊢ s ⇓ v ------------- γ ⊢ s ss ⇓ v γ ⊢ s ⇓ γ₁ γ₁ ⊢ ss ⇓ r ------------------------- γ ⊢ s ss ⇓ r
Exercise: How to change while
?
One solution:
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁ ⊢ { s } while e (s) ⇓ r ------------------------------------------------ γ ⊢ while (e) s ⇓ r
Function call.
γ ⊢ e₁ ⇓ ⟨v₁, γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨v₂, γ₂⟩ ... γₙ₋₁ ⊢ eₙ ⇓ ⟨vₙ, γₙ⟩ (x₁=v₁,...,xₙ=vₙ) ⊢ ss ⇓ v -------------------------- σ(f) = t f (t₁ x₁,...,tₙ xₙ) { ss } γ ⊢ f(e₁,...,eₙ) ⇓ ⟨v, γₙ⟩
To implement the side condition, we need a global map σ
from function names f
to their definition
t f (t₁ x₁,...,tₙ xₙ) { ss }
.
Quiz: In the last rule, what if ss
does not contain a
return
statement?
In Java, we can use Java's exception mechanism.
eval(ECall f es):
eval(es) // Evaluate list of arguments
vs ← (Δ,ss) ← lookupFun(f) // Get parameters and body of function
saveEnv() // Save current environment
γ ← setEnv(makeEnv(Δ,vs)) // New environment binding the parameters
try {
execStms(ss) // Run function body
} catch {
:
ReturnException vsetEnv(γ) // Restore environment
return v // Evaluation result is returned value
}
execStm(SReturn e):
eval(e)
v ← throw new ReturnException(v)
In Haskell, we can use the exception monad.
import Control.Monad
import Control.Monad.Except
import Control.Monad.Reader
import Control.Monad.State
type M = ReaderT Sig (StateT Env (ExceptT Val IO))
eval :: Exp → M Val
ECall f es) = do
eval (mapM eval es
vs ←
(Δ,ss) ← lookupFun f
γ ← get
put (makeEnv Δ vs)`catchError` λ v → do
execStms ss
put γreturn v
execStm :: Stm → M ()
SReturn e) = do
execStm (
v ← eval e throwError v
N.B.:
makeEnv (x₁:t₁, ..., xₙ:tₙ) (v₁, ..., vₙ) = (x₁=v₁, ... xₙ=tₙ)
.
A program is interpreted by executing the statements of the
main
function in an environment with just one empty
block.
What is wrong with this rule?
γ ⊢ e₁ ⇓ ⟨b₁;γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨b₂;γ₂⟩ ------------------------------------ b = b₁ ∧ b₂ γ ⊢ e₁ && e₂ ⇓ ⟨b;γ₂⟩
Here are some examples where we would like to shortcut computation:
int b = 1;
if (b == 0 && the_goldbach_conjecture_holds_up_to_10E100) { ... }
int x = 0 * number_of_atoms_on_the_moon;
Short-cutting logical operators like &&
is
essentially used in C, e.g.:
if (p != NULL && p[0] > 10)
Without short-cutting, the program would crash in p[0]
by accessing a NULL
pointer.
Rules with short-cutting:
γ ⊢ e₁ ⇓ ⟨false; γ₁⟩ ------------------------- γ ⊢ e₁ && e₂ ⇓ ⟨false; γ₁⟩ γ ⊢ e₁ ⇓ ⟨true; γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨b; γ₂⟩ ---------------------------------------- γ ⊢ e₁ && e₂ ⇓ ⟨b; γ₂⟩
The effects of e₂
are not executed if e₁
already evaluated to false
.
Example:
while (x < 10 && f(x++)) { ... }
You can circumvent this by defining your own and
:
bool and (bool x, bool y) { return x && y; }
while (and (x < 10, f(x++))) { ... }
(This is properly covered in the lecture on functional languages and in lab 4.)
Example 1:
double (x) = x + x
double (1+2)
... = double (3) = 3 + 3 = 6
... = (1+2) + (1+2) = 3 + (1+2) = 3 + 3 = 6
... = let x=1+2 in x + x = let x = 3 in x + x = 3 + 3 = 6
Example 2:
zero (x) = 0
zero (1+2)
... = zero (3) = 0
... = 0
... = let x=1+2 in 0 = 0
Languages with effects (such as C/C++) mostly use
call-by-value.
Haskell is pure (no effects) and uses call-by-need, which refines
call-by-name.
Andreas Abel, updated 2023-11-16