Typechecking a Purity Program

by Phil Freeman on 2011/05/02


The Purity compiler uses a modified version of Algorithm W to verify the type safety of its programs. I am going to write a little about the common implementation of Algorithm W in the simply typed lambda calculus, and then its relationship with the Purity typechecking algorithm.

Algorithm W

Algorithm W is a bottom-up constraint-based type inference algorithm for the simply-typed lambda calculus enriched with polymorphic let expressions. The algorithm generates a set of type constraints from an untyped term using the Robinson Unification method. If unification fails at any point then the expression cannot be typed.

In the second phase of the algorithm, the generated constraints are solved to yield the final typed expression.

Alternatively, the bottom-up constraint generation phase can be fused with the constrant solving phase to create a one-pass type checking algorithm, but I have not adopted this approach here, in the interests of clarity.

Generating Constraints

The core of the constraint generation algorithm is a modified version of the Robinson Unification algorithm.

The type system is augmented with a countably infinite supply of unknown types. Unknown types are labelled in the code by integers, and unify with any other type. Unification of compound types is defined by structural recursion.

A constraint is a pair of (the index of) an unknown type and a type. A constraint set is a collection of constraints, and may contain a constraint for the same unknown many times.

The pseudo-code for the method unify is shown below.

unify : Type -> Type -> [Constraint]

unify T T = []

unify x t = unify t x = [x = t]

unify (a -> b) (c -> d) = unify a c :: unify b d

unify (a + b) (c + d) = unify a c :: unify b d

unify (a . b) (c . d) = unify a c :: unify b d

unify (C s1, ..., sn) (C t1, ..., tn) = unify s1 t1 :: ... :: unify sn tn

unify _ _ = fail

Here, x is any unknown, T is a type parameter and C is an n-ary type constructor.

Construction of a constraint set proceeds bottom-up using unify to equate types of subexpressions with the appropriate part of the type of a containing expression.

The method generate-constraints takes an untyped term and a type environment (a mapping from variable names to types) and returns a type and a set of constraints.

A fragment of the pseudo-code for generate-constraints is shown below. The analysis for the other cases is similar.

Environment = string -> Type

Constraint = (int, Type)

generate-constraints : Term -> Environment -> (Type, [Constraint])

generate-constraints (f x, e) =
  let (T1, C1) = generate-constraints f e in
  let (T2, C2) = generate-constraints x e in
  let T3 = new unknown type in
  (T3, unify T1 (T2 -> T3) :: C1 :: C2)

generate-constraints (g . f, e) =
  let (T1, C1) = generate-constraints f e in
  let (T2, C2) = generate-constraints g e in
  let T3, T4, T5 = new unknown types in
  (T3 -> T5, C1 :: C2 :: unify T1 (T3 -> T4) :: unify T2 (T4 -> T5))

generate-constraints (v, e) = (e[v], []) 

generate-constraints (v => t, e) =  
  let T = new unknown type in 
  let e' = e :: [v : T] in
  let (T1, C1) = generate-constraints t e' in (T -> T1, C1) 

...

Solving Constraints

Once the constraints have been generated, they can be solved by substitution to yield the type of the initial expression. If one annotates all subexpressions with unknown types as they are generated during the constraint collection phase then a typed expression can be constructed which contains the types of all subexpressions.

We maintain a mutable set of constraints and begin by adding all constraints generated in the first phase of the algorithm. At each step, a constraint is removed from the set and applied to both the solution set and the remaining constraints.

The method by which a constraint is applied to another constraint is determined by case analysis and a unification step may result in new constraints being added to the set. The algorithm can therefore be seen as a producer/consumer problem with constraints being generated by unification of types, and constraints being applied to the solution set one-by-one. When the set is empty and there are no more constraints to apply, type checking is complete and one can read off the values of the unknown types from the solution set.

The method solve-constraints takes a constraint set (a mapping from unknowns to types) and a solution set, (also a mapping from unknowns to types), and returns a modified solution set.

The pseudo-code for solve-constraints is as follows:

SolutionSet = int -> Type
 
solve-constraints : [Constraint] -> SolutionSet -> SolutionSet
 
solve-constraints [] _ = []
 
solve-constraints ((x, T0) :: cs) s =
  let s0 = replace x, with T0 in s
  let cs' = [c1 | c1 <- substitute (x, T0) c0 ; c0 <- cs]
  solve-constraints cs' s0  

where

substitute Constraint -> Constraint -> [Constraint]

substitute (x, T0) (x, T1) = unify T0 T1

substitute (x, T0) (y, T1) =
  let T2 = replace x with T in
  (y, T2) (if x /= y)

That is, if two constraints have the same unknown type on their left-hand sides, then we unify the types on their right-hand sides and add the result to the end of the queue of constraints not yet substituted. If the left-hand sides are different, then we substitute the right hand side of the first constraint for the left hand side of the first constraint into the right hand side of the second constraint.

We can see that this process terminates since any unknown will be eliminated from the constraint set and from the solution set the first time it is encountered as the left-hand side of a constraint.

Worked Examples

Now let's see a few worked examples to illustrate the concepts.

Example 1 - The pairing function

Let x = curry id.

We gather constraints from the bottom up. Let the identity term have type T0 -> T0 and the curried function have type T1 -> T2 -> T3. We have that the type of the argument to the curry function is T1 . T2 -> T3.

Unifying this type with the type of the identity term gives the constraint set

T0 = T1 . T2
T0 = T3 

We substitute the first constraint into the (empty) solution set to get {T0 : T1 . T2}.

The unknowns on the left hand sides of the two constraints are equal, so we unify the right hand sides, and the solution set reduces to the singleton

T3 = T1 . T2

Substituting this final constraint into the solution set gives the final solution set as {T0 , T3 : T1 . T2} and the type of the original expression as

forall Ti . T1 -> T2 -> T1 . T2

Example 2 - The S combinator

Let S = \x y z -> x z (y z)

This time, we gather the following constraints:

T0 = T2 -> T3
T1 = T2 -> T4
T3 = T4 -> T5

The first substitution gives a solution set {T0 : T2 -> T3} and a new constraint set

T1 = T2 -> T4
T3 = T4 -> T5

The second substitution gives a solution set {T0 : T2 -> T3 ; T1 : T2 -> T4} and a new constraint set

T3 = T4 -> T5

The final substitution gives the answer as the solution set

{T0 : T2 -> T4 -> T5 ; T1 : T2 -> T4 ; T3 = T4 -> T5}

and the type of the final expression is determined to be

forall Ti . (T2 -> T4 -> T5) -> (T2 -> T4) -> T4 -> T5 

Comments powered by Disqus