Introduction

The expression problem (1998)

In 1998, Philip Wadler defined the Expression Problem as follow:

The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety

In basic Haskell it is straightforward to add new functions over a data type. Suppose we have the following arithmetic expression data type:

We can independently add an evaluator function, potentially in another module:

However if we want to add a new case to the data type (say support for multiplication), we have to modify both the data type definition and the functions using it:

Data types à la carte (2008)

Ten years later (in 2008), Wouter Swierstra described a technique to handle this in his well-known Data types à la carte paper. The first idea is to define data constructors independently and to use a type parameter to leave open the data type they are part of.

Defining a new independent constructor is easy:

The second idea is to use a combining data type:

It is similar to Either except that it passes the same additional type parameter to both f and g type constructors. It can be used to compose independent data constructors without creating a new data type:

ExprF has kind Type -> Type and its type parameter is used as the e parameter of the independent data constructors. We can set it to arbitrary types such as Int to build valid values:

However the main use of this parameter should be to indicate the type of the expression data type we want to build, say Expr. Hence we would like to write something like this:

Oops, we can’t build this cyclic (infinite) type. This leads us to the third idea: use another data type to handle the recursive nature of the expression type:

We can abstract on it to use the same data type for different expression types:

Summary

We use 3 different kinds of data types:

1. Constructor data types: Val, Add, Mul
2. Combinator data type: :+:
3. Recursivity handling data type: Fix

By using these different data types we have untangled the construction of ADTs (algebraic data types) and we can freely add new constructor data types and mix them into different algebraic data types.

Operations on these algebraic data types can be defined independently by using type-classes and recursion schemes.

In this post I’d like to slightly revisit and enhance the approach by using modern Haskell and in particular another combinator data type and pattern synonyms.

A different combinator data type

Variant (xs :: [*]) is a sum type that contains a value whose type can be any type in the xs list (see this previous post). We can use it to define the following VariantF type:

We will use this type as a combinator data type. Similarly to the :+: combinator data type presented in the introduction, VariantF passes its e parameter to all of its “member” types and has an instance of the Functor class.

Now instead of writing f :+: g :+: h :+: i to combine constructor data types to form an ADT we can write VariantF '[f,g,h,i].

First benefit: just like using Variant is more efficient – O(1) memory usage and (de)construction – than using a nest of Either, using VariantF is more efficient than using a nest of :+:.

Let’s define an EADT type synonym as follow:

And a pattern synonym to easily build EADT values:

Second benefit: with modern Haskell we can define bidirectional pattern synonyms that make creation and matching of EADT values much more nicer.

Defining an EADT and building some values

Let’s define some constructors:

And an algebraic data type:

We can build EADT values in the following (equivalent) ways:

But it’s nicer to define some additional pattern synonyms:

We can pattern-match EADT values with VF or with pattern synonyms too:

Note that the wildcard match is necessary, otherwise the pattern-match is reported as non exhaustive by GHC (warning). This is one shortcoming due to the use of the Variant type.

Anyway it isn’t really an issue in this case because showAddValADT shouldn’t be written like this to be extensible.

We can define a new EADT:

We only need to change the type to declare values of the new EADT:

We can also lift values from one EADT to the other as follow:

Thanks to the type-checking, the opposite direction is not allowed:

Extensible operations on EADTS (i.e. that need to be extended when we add new constructors) are defined with type-classes.

Type classes and recursion schemes

We can use recursion schemes, for example:

Type classes and explicit recursion

We can also use explicit recursion, for example:

Transformation example

Let’s rewrite Swiestra’s example to apply the distributivity law. The operation is not extensible, hence we don’t have to use type classes to define it. It works on any EADT as long as the constraints are fulfilled: supporting Add and Mul constructors.

Constructor removal example (~desugaring)

Let’s show how to transform each multiplication into an addition while removing the MulF constructor from the EADT in a generic type-safe manner:

This kind of transformation can be used to transform a big language into a smaller one (e.g. Haskell into Core).

Annotation examples (parameter example)

Let’s add a constructor to support annotations in the AST. We let the annotation type parametric.

We can easily define an EADT supporting parametric annotations, for instance:

Now we can build an example of an annotated expression where the annotation type is String:

We can easily prune annotations from the expression similarly to what we did above to transform multiplications into additions:

As it is a bit sad to have to specify the type of the resulting EADT without the annotations, we can define the following helper function:

Type-checking example (paramorphism example)

First let’s add support for Float values:

Now we can write invalid expressions mixing both Int and Float values:

Let’s write a type-checker to detect these cases. For a given expression, our type-checker will either return its valid type (Int or Float) or an erroneous type with an explanation message:

Let’s try it:

The error message is not very useful, let’s write a better type-checker that indicates which expressions are at fault:

This time instead of using cata we will use another recursion scheme called para (for paramorphism) that also passes expression values (u and v in AddF instance):

Let’s check that we can easily extend our expression language and our type-checker without rewriting anything by adding support for type ascriptions (i.e., types specified by the programmer):

Now let’s check that asserting some types is correctly supported:

Finally let’s write another type-checker that adds inferred types as type-ascriptions to every node of the expression.

class TypeCheck3 (f :: * -> *) ys where

instance TypeCheck3 (VariantF '[]) ys where
typeCheck3' = undefined

instance (TypeCheck3 x ys, TypeCheck3 (VariantF xs) ys)  => TypeCheck3 (VariantF (x ': xs)) ys where
typeCheck3' v = case popVariantFHead v of
Right u -> typeCheck3' u
Left  w -> typeCheck3' w

instance (TypedF :<: ys, ValF :<: ys) => TypeCheck3 ValF ys where
typeCheck3' (ValF i) = TypedF TInt (Val i)

instance (TypedF :<: ys, FloatValF :<: ys) => TypeCheck3 FloatValF ys where
typeCheck3' (FloatValF f) = TypedF TFloat (FloatVal f)

instance
( TypedF :<: ys
, Functor (VariantF ys)
where
typeCheck3' (AddF u@(TypedF t1 _) v@(TypedF t2 _))
| t1 == t2       = TypedF t1 addVal
| TError _ <- t1 = TypedF (TError "") addVal -- don't propagate errors
| TError _ <- t2 = TypedF (TError "") addVal
| otherwise = TypedF (TError \$ "can't add " ++ show t1 ++ " with " ++ show t2) addVal
where

-- similar for MulF...

instance
( TypedF :<: ys
, Functor (VariantF ys)
) => TypeCheck3 TypedF ys
where
typeCheck3' (TypedF t u@(TypedF te _))
| t == te        = u                         -- Remove redundant ascription
| TError _ <- t  = TypedF t (VF u)           -- keep erroring ascription as-is
| TError _ <- te = TypedF (TError "") (VF u) -- don't propagate errors
| otherwise      = TypedF (TError \$ "wrong type ascription" ++
" (inferred type: " ++ show te ++ ")") (Typed t (VF u))

typeCheck3 :: forall xs.
( TypeCheck3 (VariantF xs) xs
, Functor (VariantF xs)
, TypedF :<: xs
typeCheck3 e = VF (cata typeCheck3' e :: TypedF (EADT xs))

Let’s try it:

And try it:

(+) :: TError ""
|
+- (+) :: TError "can't add TInt with TFloat"
|  |
|  +- (*) :: TInt
|  |  |
|  |  +- 2 :: TInt
|  |  |
|  |  `- 3 :: TInt
|  |
|  `- 10.0 :: TFloat
|
`- 5 :: TInt

Composition example

Finally, we can write the following one-line example that strips annotations, adds support for type-ascriptions, performs type-checking and displays the resulting EADT:

The compiler ensures that the EADT is valid at each phase. For instance if we forget to remove annotations, type-checking and drawing phases will complain at compile time:

This way we can easily encode phase invariants in the types.

Remarks

This post has been motivated by a discussion on Reddit about AST transformations: each phase of a compiler transforms an AST into another and there is a tension between using a different AST data type per phase (best approach if we consider phases independently) or a generic AST data type that can be used by every phase (best approach to easily compose/interleave phases).

The usual answer (as used by GHC) is taken from the “Trees that grow” paper and consists in using an ADT as a generic AST and to parameterize it with type families. I don’t really like it and I think that using EADTs instead could probably avoid the use of AST specific type-families and hence could make the AST more generic (easier to add phases, etc.).

I have used recursion schemes in this post without really introducing them. Patrick Thomson wrote a very good series of posts about them that I highly recommend.

Please let me know if you do some fun or interesting things with EADTs!