Blog/Developer/Building a Hindley Milner Type Inference System for Airscript

At Airkit, we developed our own programming language called Airscript. In case you’re curious why we built our own programming language, take a look at this Airkit blog!

When developers build apps with Airscript, we want them to feel confident about the code they write. This is the job of Airscript’s type checker. It helps developers catch errors before they even run their code.

The figure below demonstrates Airscript’s type checker warning developers against adding a text with a number.

Type checking doesn’t come without a cost. In some programming languages, developers must provide type annotations for every expression to enable type checking. This may be cumbersome for the developers.

Type inference solves this problem by allowing a compiler to automatically infer the types of expressions without requiring explicit type annotations. Languages that support type inference include Typescript, Rust, Haskell, and many others.

In this blog, I will talk about how I built a type inference system for Airscript using Hindley Milner’s type inference algorithm.

### What is type inference?

The best way to understand type inference is by looking at examples.

In the expression below, what can we infer about the type of a?

``````1
````a + 2````

Variable `a` must be a number because the ‘+’ operation requires two number operands.

Let’s try a harder example. What can we deduce about the type of `a` in the following expression?

``````1
````a + 2````

It has to be a list of numbers. By the ‘+’ operator, we can deduce that `a` is a number. By `a`, we can deduce that a is a list of numbers.

We just performed type inference! As we can see, type inference is the process of deducing the type of an expression based on its usage.

### Type Checking with Type Inference

Type inference can be used to catch tricky type errors like the one below.

In this example, we start off with variable `a`. The type checker infers that `a` is both an array of numbers and a number, which results in a type error.

Here are the rough steps the type checker was able to infer `a`‘s type:

1. By +, the type checker knows that both `a` and a are both numbers
2. By `a`, the type checker knows that a must be a list of number
3. Step 1 and step 2 contradict each other – `a` cannot be both a list and a number.

In the example below, the type checker recurses into the object and realizes that the inferred type based on `a` and `a+1` contradict one another.

As we can see from these examples, the type checker uses type inference to generate a set of statements about the variables. Then, it checks if the statements contradict each other to find any type errors.

### Hindley Milner Type System

Almost every statically typed functional programming language (e.g. Haskell and OCaml) that has type inference is based on the Hindley-Milner (HM) type system. The HM type system comes with a type inference algorithm called Algorithm W which is the basis for my implementation.

The idea behind the algorithm is similar to something we’ve all done before in middle school – solving a system of equations.

Let’s assume we have the following equations:

To “solve” this system of equations means to find a set of substitutions from variables to numbers that satisfy all the equations. In this example, the substitution `[y=2, x=3]`is valid because it satisfies both statements.

Some systems of equations may be invalid. In the example below, it is impossible to find a valid substitution because `x` cannot be both 3 and 2.

In type theory land, we refer to this as a unification error because we are unable to “unify” the equations.

The process of type inference in the HM algorithm is similar to solving a system of equations. Instead of trying to find substitutions from variables to numbers, the algorithm finds substitutions from variables to types. Let’s look at the HM algorithm in more detail.

### Type Inference Algorithm

The core idea behind Hindley Milner is that every expression has a type and each type is given a type variable. The algorithm walks the expression recursively and collects constraints about the type variables. Type variables are stored in a data structure called the type environment stack, which plays the role of symbol tables for compilers.

The algorithm then tries to solve for a valid substitution. But if two constraints conflict with one another, it terminates with a type error.

Let’s try and infer the type of the following function

``````1
2
3
``````def f(a) {
a + 2
}``````

Here are the steps of the algorithm:

• First, we assign type variables to the function’s input and output, which are `T1` and `T2` respectively. Therefore, the function f has type `T1 → T2`.
• Next, we extend the type environment to `[a: T1]` and recur into the body of the function
• We look up the type scheme of +, which is `ForAll[] (Int, Int -> Int)`. The empty list next to ForAll means that + does not have generic type parameters.
• The first argument is `a`. This is an indexed binding. The scheme of indexed binding is  `Forall ['X], (['X] → 'X)`. This means that indexed binding takes a list of `'X` and returns ‘X. We instantiate the type to give `[T3] -> 'T3`.
• We know that a‘s type variable is `T1`. So we unify  `[T3] -> 'T3` with `T1 -> Int` to obtain the substitution `[T1 = [T3], T3 = Int]`. We then simplify the substitution to get `[T1 = [Int], T3 = Int]`. The type of `a` is an `Int`.
• The second parameter is an `Int`, so no more work must be done.
• We unify the return type of + with T2, the type variable for the output of f to get the substitution [T2 = Int].
• Finally, we apply the substitutions we have collected so far `[T1 = [Int], T2 = Int ]` to `T1 -> T2` to conclude that f ‘s type is `[Int] → Int`.

Let’s look at more examples!

### Variables and Type Variables

To make debugging the type inference algorithm easier, I built a visualizer tool to list out all variables and type variables.

In the image below, I took screenshots of the variables and type variables for 3 different expressions with the same scope.

In frame 1, we can’t infer much about the variable, so `a` ‘s type is still the type variable.

In frame 2, we can infer that a is a list type. However, we don’t know the type of list. Therefore, the type of a is a list of `T39`. We can also see from the type variables that `T37`‘s substitution is a list of `T39`.

In frame 3, by `+ 2`, we can infer that `T39` is a number. Therefore, `a` ‘s type is [number].

### Let-In expressions

Some expressions can create new scopes and scoped variables. Let’s look at how HM deals with the Let-In expressions.

`Let-In` expressions create temporary scoped variables that are only available in its body. In this example, we declare a variable `var1` and initialize it as an object. In frame 1, the output of the body has a type of `{ hello: T39 }`, where `T39` is the element type of a.

In frame 2, we can deduce, based on the expression `UPPERCASE(var1.hello)`, that `var1.hello` is a string. Therefore, `T39` is a string. We can also infer that `a` is a list of strings.

We can see in this example that variables and type variables form an interconnected graph. Adding a new constraint into the graph may cause rippling effects to allow the system to infer seemingly unconnected nodes.

### Row Polymorphism

The HM type system doesn’t have first-class support for the record type. Luckily, there is an extension to HM called row polymorphism that supports the record type.

The core idea behind row polymorphism is that records can be extended into bigger records. A record type in this type system can be written as `{ t1: T1, ... tk: Tk, ρ}`. The `rho (ρ)` is a polymorphic variable that represents a variable that can be instantiated to extra fields.

Let’s suppose we start with a record variable a. Its type is { ρ }. If the compiler encounters `a.b + 1`, it extends the type of `a` to `{ b: number, ρ}`. In other words, each record type has the power to keep extending itself.

In this example, the variable a extends from a record containing just b to a record containing b and c.

### Conclusion

Building a robust type system for Airscript is a never-ending journey. Luckily, there is a lot of research on programming languages we can draw inspiration from. This is what I enjoy the most about working at Airkit – bridging the theoretical world with the real world. I love finding ways to incorporate cool concepts like Hindley Milner into our product to help our developers solve real-world problems in industries like e-commerce and insurance.