# Introduction

## The expression problem (1998)

In 1998, Philip Wadler defined the Expression Problem as follow:

The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety

In basic Haskell it is straightforward to add new functions over a data type. Suppose we have the following arithmetic expression data type:

``data Expr = Val Int | Add Expr Expr``

We can independently add an evaluator function, potentially in another module:

``````eval :: Expr -> Int
eval (Val x)   =  x
eval (Add x y) = eval x + eval y``````

However if we want to add a new case to the data type (say support for multiplication), we have to modify both the data type definition and the functions using it:

``````data Expr = .... | Mul Expr Expr

eval :: Expr -> Int
....
eval (Mul x y) = eval x * eval y``````

## Data types à la carte (2008)

Ten years later (in 2008), Wouter Swierstra described a technique to handle this in his well-known Data types à la carte paper. The first idea is to define data constructors independently and to use a type parameter to leave open the data type they are part of.

``````-- Independent data constructors. Parameter `e` represents the data type they
-- will be part of. It is required even if it is not used in the right hand
-- side.
data Val e = Val Int

Defining a new independent constructor is easy:

``data Mul e = Mul e e``

The second idea is to use a combining data type:

``data (f :+: g) e = Inl (f e) | Inr (g e)``

It is similar to `Either` except that it passes the same additional type parameter to both `f` and `g` type constructors. It can be used to compose independent data constructors without creating a new data type:

``type ExprF = Val :+: Add``

`ExprF` has kind `Type -> Type` and its type parameter is used as the `e` parameter of the independent data constructors. We can set it to arbitrary types such as `Int` to build valid values:

``y = Inr (Add 5 8) :: ExprF Int``

However the main use of this parameter should be to indicate the type of the expression data type we want to build, say `Expr`. Hence we would like to write something like this:

``````type Expr = ExprF Expr

>error:
Cycle in type synonym declarations:
<interactive>:12:1-22: type Expr = ExprF Expr``````

Oops, we can’t build this cyclic (infinite) type. This leads us to the third idea: use another data type to handle the recursive nature of the expression type:

``data Expr = Expr (ExprF Expr)``

We can abstract on it to use the same data type for different expression types:

``````-- `Fix` type as defined in Data.Functor.Foldable for instance
newtype Fix f = Fix (f (Fix f))

type Expr = Fix ExprF``````

## Summary

We use 3 different kinds of data types:

1. Constructor data types: `Val`, `Add`, `Mul`
2. Combinator data type: `:+:`
3. Recursivity handling data type: `Fix`

By using these different data types we have untangled the construction of ADTs (algebraic data types) and we can freely add new constructor data types and mix them into different algebraic data types.

Operations on these algebraic data types can be defined independently by using type-classes and recursion schemes.

In this post I’d like to slightly revisit and enhance the approach by using modern Haskell and in particular another combinator data type and pattern synonyms.

## A different combinator data type

`Variant (xs :: [*])` is a sum type that contains a value whose type can be any type in the `xs` list (see this previous post). We can use it to define the following `VariantF` type:

``````newtype VariantF (xs :: [* -> *]) e = VariantF (Variant (ApplyAll e xs))

-- ApplyAll e '[f,g,h] ==> '[f e, g e, h e]

instance Functor (VariantF xs) where ....``````

We will use this type as a combinator data type. Similarly to the `:+:` combinator data type presented in the introduction, `VariantF` passes its `e` parameter to all of its “member” types and has an instance of the `Functor` class.

Now instead of writing `f :+: g :+: h :+: i` to combine constructor data types to form an ADT we can write `VariantF '[f,g,h,i]`.

First benefit: just like using `Variant` is more efficient – O(1) memory usage and (de)construction – than using a nest of `Either`, using `VariantF` is more efficient than using a nest of `:+:`.

Let’s define an EADT type synonym as follow:

``type EADT xs = Fix (VariantF xs)``

And a pattern synonym to easily build EADT values:

``````pattern VF :: forall e f cs.
( e ~ EADT cs  -- allow easy use of TypeApplication to set the EADT type
, f :<: cs     -- constraint synonym ensuring `f` is in `cs`
pattern VF x = Fix (VariantF (V' x))   -- `V'` match a variant value (without
-- checking the membership: we already
-- do it with :<:)``````

Second benefit: with modern Haskell we can define bidirectional pattern synonyms that make creation and matching of EADT values much more nicer.

## Defining an EADT and building some values

Let’s define some constructors:

``````{-# LANGUAGE DeriveFunctor #-}

data ValF e = ValF Int deriving (Functor)

And an algebraic data type:

``type AddValADT = EADT '[ValF,AddF]``

We can build EADT values in the following (equivalent) ways:

``````x :: AddValADT
x = VF (ValF 10)

But it’s nicer to define some additional pattern synonyms:

``````pattern Val :: ValF :<: xs => Int -> EADT xs
pattern Val a = VF (ValF a)

x = Val 10

We can pattern-match EADT values with `VF` or with pattern synonyms too:

``````showAddValADT :: AddValADT -> String
-- match with VF
VF (ValF u)  -> show u

-- match with additional pattern synonym

"(10 + 3)"``````

Note that the wildcard match is necessary, otherwise the pattern-match is reported as non exhaustive by GHC (warning). This is one shortcoming due to the use of the `Variant` type.

Anyway it isn’t really an issue in this case because `showAddValADT` shouldn’t be written like this to be extensible.

## Extending an EADT and converting values from one EADT to another

``data MulF e = MulF e e deriving (Functor)``

We can define a new EADT:

``type MulAddValADT = EADT '[ValF,AddF,MulF]``

We only need to change the type to declare values of the new EADT:

``````x' :: MulAddValADT
x' = Val 10``````

``````> x' = Val 10 :: EADT '[ValF,AddF]

We can also lift values from one EADT to the other as follow:

``````xx' :: MulAddValADT

Thanks to the type-checking, the opposite direction is not allowed:

``````> liftEADT xx' :: AddValADT

<interactive>:83:1: error:
• '[MulF]
is not a subset of

Extensible operations on EADTS (i.e. that need to be extended when we add new constructors) are defined with type-classes.

### Type classes and recursion schemes

We can use recursion schemes, for example:

``````import Data.Functor.Foldable

class ShowEADT (f :: * -> *) where
showEADT' :: f String -> String

-- boilerplate instances for the combinator data type

-- instances for constructors data types

showEADT' (ValF i) = show i

showEADT' (AddF u v) = "(" ++ u ++ " + " ++ v ++ ")" -- no recursive call

showEADT' (MulF u v) = "(" ++ u ++ " * " ++ v ++ ")" -- no recursive call

-- `cata` (for catamorphism) handles the recursion/fold

"(2 + (10 + 3))"

"(2 + (10 + 3))"

"(2 * (10 + 3))"

showEADT :: (ShowEADT (Base t), Recursive t) => t -> String -- type inferred by GHC

"(2 * (10 + 3))"``````

### Type classes and explicit recursion

We can also use explicit recursion, for example:

``````class Eval e where
-- evaluate an expression
eval :: e -> Int

-- same boilerplate

instance Eval (VariantF '[] e) where
eval = undefined

instance (Eval (x e), Eval (VariantF xs e))  => Eval (VariantF (x ': xs) e) where
eval v = case popVariantFHead v of
Right u -> eval u
Left  w -> eval w

type EvalAll xs = Eval (VariantF xs (EADT xs))

-- instances for constructors

instance Eval (ValF e) where
eval (ValF i) = i

instance EvalAll xs => Eval (MulF (EADT xs)) where
eval (MulF u v) = evalEADT u * evalEADT v -- explicit recursion

15
15
26``````

### Transformation example

Let’s rewrite Swiestra’s example to apply the distributivity law. The operation is not extensible, hence we don’t have to use type classes to define it. It works on any EADT as long as the constraints are fulfilled: supporting Add and Mul constructors.

``````import Control.Arrow

-- distribute multiplication over addition if it matches
distr (Mul a (Add c d)) = Just (Add (Mul a c) (Mul a d))
distr _                 = Nothing

-- bottom up traversal that performs an additional bottom up traversal in
-- the transformed sub-tree when a transformation occurs.
bottomUpFixed :: Functor f => (Fix f -> Maybe (Fix f)) -> Fix f -> Fix f
bottomUpFixed f = unfix >>> fmap (bottomUpFixed f) >>> Fix >>> f'
where
f' u = case f u of
Nothing -> u
Just v  -> bottomUpFixed f v

-- example
expr = Mul (Val 2) (Add (Val 10) (Mul (Val 3) (Add (Val 5) (Val 20))))

"(2 * (10 + (3 * (5 + 20))))"

"((2 * 10) + ((2 * (3 * 5)) + (2 * (3 * 20))))"``````

### Constructor removal example (~desugaring)

Let’s show how to transform each multiplication into an addition while removing the `MulF` constructor from the EADT in a generic type-safe manner:

``````class MulToAdd (f :: * -> *) ys where

-- boilerplate

instance MulToAdd (VariantF '[]) ys where

-- constructor instances

-- (it makes no sense but it's just an example)

instance {-# OVERLAPPABLE #-} f :<: ys => MulToAdd f ys where
mulToAdd = VF  -- the other constructors are kept as-is

-- example
expr = Mul (Val 2) (Add (Val 10) (Mul (Val 3) (Add (Val 5) (Val 20))))

"(2 + (10 + (3 + (5 + 20))))"

"((2 + 10) + ((2 + (3 + 5)) + (2 + (3 + 20))))"``````

This kind of transformation can be used to transform a big language into a smaller one (e.g. Haskell into Core).

### Annotation examples (parameter example)

Let’s add a constructor to support annotations in the AST. We let the annotation type parametric.

``````-- | Annotate `e` with a value having type `a`
data AnnotF a e = AnnotF a e deriving (Functor)

pattern Annot :: AnnotF a :<: xs => a -> EADT xs -> EADT xs
pattern Annot a e = VF (AnnotF a e)

instance Show a => ShowEADT (AnnotF a) where
showEADT' (AnnotF a e) = "{" ++ show a ++ "} " ++ e``````

We can easily define an EADT supporting parametric annotations, for instance:

``type AnnotExp a = EADT '[ValF,AddF,MulF,AnnotF a]``

Now we can build an example of an annotated expression where the annotation type is `String`:

``````annotExpr :: AnnotExp String
annotExpr = Mul (Val 8) (Annot "An addition" (Add (Val 10) (Val 5)))

"(8 * {\"An addition\"} (10 + 5))"``````

We can easily prune annotations from the expression similarly to what we did above to transform multiplications into additions:

``````class RemoveAnnot (f :: * -> *) ys where

instance RemoveAnnot (VariantF '[]) ys where
removeAnnot' = undefined

instance
( RemoveAnnot x ys
, RemoveAnnot (VariantF xs) ys
)  => RemoveAnnot (VariantF (x ': xs)) ys
where
removeAnnot' v = case popVariantFHead v of
Right u -> removeAnnot' u
Left  w -> removeAnnot' w

instance{-# OVERLAPPING #-} RemoveAnnot (AnnotF a) ys where
removeAnnot' (AnnotF _a e) = e

instance {-# OVERLAPPABLE #-} f :<: ys => RemoveAnnot f ys where
removeAnnot' = VF

"(8 * (10 + 5))"``````

As it is a bit sad to have to specify the type of the resulting EADT without the annotations, we can define the following helper function:

``````-- remove `AnnotF a` (for all `a`) in `xs`
type family FilterAnnot xs where
FilterAnnot '[]              = '[]
FilterAnnot (AnnotF a ': xs) = FilterAnnot xs
FilterAnnot (x ': xs)        = x ': FilterAnnot xs

removeAnnot ::
( Functor (VariantF xs)
, RemoveAnnot (VariantF xs) (FilterAnnot xs)
removeAnnot = cata removeAnnot'

"(8 * (10 + 5))"``````

### Type-checking example (paramorphism example)

First let’s add support for Float values:

``````data FloatValF e = FloatValF Float deriving (Functor)

pattern FloatVal :: FloatValF :<: xs => Float -> EADT xs
pattern FloatVal a = VF (FloatValF a)

showEADT' (FloatValF i) = show i``````

Now we can write invalid expressions mixing both Int and Float values:

``````> showEADT (Add (Val 10) (FloatVal 5) :: EADT '[ValF,FloatValF,AddF])
"(10 + 5.0)"``````

Let’s write a type-checker to detect these cases. For a given expression, our type-checker will either return its valid type (Int or Float) or an erroneous type with an explanation message:

``````data Typ
= TInt
| TFloat
| TError String   -- the type of an invalid expression with some explanation
deriving (Show,Eq)

class TypeCheck (f :: * -> *) where
typeCheck' :: f Typ -> Typ

instance TypeCheck (VariantF '[]) where
typeCheck' = undefined

instance (TypeCheck x, TypeCheck (VariantF xs))  => TypeCheck (VariantF (x ': xs)) where
typeCheck' v = case popVariantFHead v of
Right u -> typeCheck' u
Left  w -> typeCheck' w

instance TypeCheck ValF where
typeCheck' _ = TInt

instance TypeCheck FloatValF where
typeCheck' _ = TFloat

| t1 == t2       = t1
| TError _ <- t1 = t1 -- propagate errors
| TError _ <- t2 = t2
| otherwise      = TError \$ "can't match " ++ show t1 ++ " with " ++ show t2``````

Let’s try it:

``````> cata typeCheck' (Add (Val 10) (Val 5) :: EADT '[ValF,FloatValF,AddF])
TInt

TError "can't match TInt with TFloat"``````

The error message is not very useful, let’s write a better type-checker that indicates which expressions are at fault:

``````class TypeCheck2 (f :: * -> *) ys where
typeCheck2' :: f (EADT ys, Typ) -> Typ

instance TypeCheck2 (VariantF '[]) ys where
typeCheck2' = undefined

instance (TypeCheck2 x ys, TypeCheck2 (VariantF xs) ys)  => TypeCheck2 (VariantF (x ': xs)) ys where
typeCheck2' v = case popVariantFHead v of
Right u -> typeCheck2' u
Left  w -> typeCheck2' w

instance TypeCheck2 ValF ys where
typeCheck2' _ = TInt

instance TypeCheck2 FloatValF ys where
typeCheck2' _ = TFloat

| t1 == t2       = t1
| TError _ <- t1 = t1 -- propagate errors
| TError _ <- t2 = t2
| otherwise = TError \$ "can't add `" ++ showEADT u ++ "` whose type is " ++ show t1 ++
" with `" ++ showEADT v ++ "` whose type is " ++ show t2``````

This time instead of using `cata` we will use another recursion scheme called `para` (for paramorphism) that also passes expression values (`u` and `v` in `AddF` instance):

``````> para typeCheck2' (Add (Val 10) (Val 5) :: EADT '[ValF,FloatValF,AddF])
TInt

TError "can't add `10` whose type is TInt with `5.0` whose type is TFloat"``````

Let’s check that we can easily extend our expression language and our type-checker without rewriting anything by adding support for type ascriptions (i.e., types specified by the programmer):

``````data TypedF e = TypedF Typ e deriving (Functor)

pattern Typed :: TypedF :<: xs => Typ -> EADT xs -> EADT xs
pattern Typed t e = VF (TypedF t e)

showEADT' (TypedF t e) = "(" ++ e ++ " :: " ++ show t ++ ")"

instance (Functor (VariantF ys), ShowEADT (VariantF ys)) => TypeCheck2 TypedF ys where
typeCheck2' (TypedF t (e,te))
| t == te   = t
| TError _ <- t  = t
| TError _ <- te = te
| otherwise = TError \$ "wrong type ascription " ++ show t ++
" for expression `" ++ showEADT e ++
"` with inferred type " ++ show te``````

Now let’s check that asserting some types is correctly supported:

``````> type E = EADT '[ValF,FloatValF,AddF,TypedF]

> para typeCheck2' (Add (Val 10) (Typed TInt \$ Val 5) :: E)
TInt

> para typeCheck2' (Add (Val 10) (Typed TFloat \$ Val 5) :: E)
TError "wrong type ascription TFloat for expression `5` with inferred type TInt"

> para typeCheck2' (Typed TInt \$ Add (Val 10) (Val 5) :: E)
TInt

> para typeCheck2' (Typed TFloat \$ Add (Val 10) (Val 5) :: E)
TError "wrong type ascription TFloat for expression `(10 + 5)` with inferred type TInt"``````

Finally let’s write another type-checker that adds inferred types as type-ascriptions to every node of the expression.

``````class TypeCheck3 (f :: * -> *) ys where

instance TypeCheck3 (VariantF '[]) ys where
typeCheck3' = undefined

instance (TypeCheck3 x ys, TypeCheck3 (VariantF xs) ys)  => TypeCheck3 (VariantF (x ': xs)) ys where
typeCheck3' v = case popVariantFHead v of
Right u -> typeCheck3' u
Left  w -> typeCheck3' w

instance (TypedF :<: ys, ValF :<: ys) => TypeCheck3 ValF ys where
typeCheck3' (ValF i) = TypedF TInt (Val i)

instance (TypedF :<: ys, FloatValF :<: ys) => TypeCheck3 FloatValF ys where
typeCheck3' (FloatValF f) = TypedF TFloat (FloatVal f)

instance
( TypedF :<: ys
, Functor (VariantF ys)
where
typeCheck3' (AddF u@(TypedF t1 _) v@(TypedF t2 _))
| t1 == t2       = TypedF t1 addVal
| TError _ <- t1 = TypedF (TError "") addVal -- don't propagate errors
| TError _ <- t2 = TypedF (TError "") addVal
| otherwise = TypedF (TError \$ "can't add " ++ show t1 ++ " with " ++ show t2) addVal
where

-- similar for MulF...

instance
( TypedF :<: ys
, Functor (VariantF ys)
) => TypeCheck3 TypedF ys
where
typeCheck3' (TypedF t u@(TypedF te _))
| t == te        = u                         -- Remove redundant ascription
| TError _ <- t  = TypedF t (VF u)           -- keep erroring ascription as-is
| TError _ <- te = TypedF (TError "") (VF u) -- don't propagate errors
| otherwise      = TypedF (TError \$ "wrong type ascription" ++
" (inferred type: " ++ show te ++ ")") (Typed t (VF u))

typeCheck3 :: forall xs.
( TypeCheck3 (VariantF xs) xs
, Functor (VariantF xs)
, TypedF :<: xs
typeCheck3 e = VF (cata typeCheck3' e :: TypedF (EADT xs))``````

Let’s try it:

``````> type E = EADT '[ValF,FloatValF,AddF,MulF,TypedF]

"(((10 :: TInt) + (5 :: TInt)) :: TInt)"

"(((10 :: TInt) + (5.0 :: TFloat)) :: TError \"can't add TInt with TFloat\")"

"(((((((2 :: TInt) * (3 :: TInt)) :: TInt) + (10.0 :: TFloat))
:: TError \"can't add TInt with TFloat\") + (5 :: TInt)) :: TError \"\")"``````

``````import Data.Tree

class DisplayEADT (f :: * -> *) where
displayEADT' :: f (Tree String) -> Tree String

displayEADT' (ValF i) = Node (show i) []

displayEADT' (FloatValF i) = Node (show i) []

displayEADT' (MulF u v) = Node "(*)" [u,v]

displayEADT' (TypedF t (Node l ts)) = Node (l ++ " :: " ++ show t) ts

And try it:

``````> drawEADT (typeCheck3 (Add (Add (Mul (Val 2) (Val 3)) (FloatVal 10)) (Val 5)) :: E)

(+) :: TError ""
|
+- (+) :: TError "can't add TInt with TFloat"
|  |
|  +- (*) :: TInt
|  |  |
|  |  +- 2 :: TInt
|  |  |
|  |  `- 3 :: TInt
|  |
|  `- 10.0 :: TFloat
|
`- 5 :: TInt``````

### Composition example

Finally, we can write the following one-line example that strips annotations, adds support for type-ascriptions, performs type-checking and displays the resulting EADT:

``````annotExpr :: EADT '[ValF,AddF,MulF,AnnotF String]
annotExpr = Mul (Val 8) (Annot "An addition" (Add (Val 10) (Val 5)))

(*) :: TInt
|
+- 8 :: TInt
|
`- (+) :: TInt
|
+- 10 :: TInt
|
`- 5 :: TInt``````

The compiler ensures that the EADT is valid at each phase. For instance if we forget to remove annotations, type-checking and drawing phases will complain at compile time:

``````> drawEADT (typeCheck3 (appendEADT @'[TypedF] annotExpr))

<interactive>:367:1: error:
• No instance for (DisplayEADT (AnnotF String))
arising from a use of ‘drawEADT’
• In the expression:
In an equation for ‘it’:

<interactive>:367:11: error:
• No instance for (TypeCheck3
(AnnotF String) '[ValF, AddF, MulF, AnnotF String, TypedF])
arising from a use of ‘typeCheck3’
• In the first argument of ‘drawEADT’, namely
In the expression:
In an equation for ‘it’:

This way we can easily encode phase invariants in the types.

# Remarks

This post has been motivated by a discussion on Reddit about AST transformations: each phase of a compiler transforms an AST into another and there is a tension between using a different AST data type per phase (best approach if we consider phases independently) or a generic AST data type that can be used by every phase (best approach to easily compose/interleave phases).

The usual answer (as used by GHC) is taken from the “Trees that grow” paper and consists in using an ADT as a generic AST and to parameterize it with type families. I don’t really like it and I think that using EADTs instead could probably avoid the use of AST specific type-families and hence could make the AST more generic (easier to add phases, etc.).

I have used recursion schemes in this post without really introducing them. Patrick Thomson wrote a very good series of posts about them that I highly recommend.

Please let me know if you do some fun or interesting things with EADTs!

Sources can be found here: https://github.com/haskus/haskus-utils in `Haskus.Utils.VariantF`.

Following the discussion on Reddit about this post, I have implemented the speculated “project” function from the Compositional data types paper (Future Work section) with the proposed type. I have called it `popEADT` and it allows us to perform complete case analysis (with its friends `popVariantF` and `variantFToValue`), e.g.:

``````showMulAddValADT :: MulAddValADT -> String
Right (ValF u) -> show u
Left w         -> case popVariantF w of
Notice that contrary to the previous `showAddValADT` example in the post, there is no need for any extra wildcard match this time because GHC detects that the pattern-matching is complete.