Tags: Haskell, Variant, Heterogeneous collections | February 21, 2018 |

This post is about managing collections (e.g., lists) of heterogeneous data types in Haskell.

Summary: in my opinion, using the data type I’ve called `Variant`

to handle heterogeneous collections is currently the best alternative. It is type-safe, efficient (both storage-wise and performance-wise) and easy to use.

# The problem

Suppose we have several algebraic data types for different geometric shapes:

```
data Circle = Circle
{ circleRadius :: Float
, circleCenter :: (Float,Float)
}
deriving (Show)
data Square = Square
{ squareTopLeft :: (Float,Float)
, squareWidth :: Float
}
deriving (Show)
```

and that we know how to resize them by a factor:

```
class Resizable a where
resize :: Float -> a -> a
instance Resizable Circle where
resize f (Circle r c) = Circle (f*r) c
instance Resizable Square where
resize f (Square tl w) = Square tl (f*w)
```

We can write a function to resize a list of shapes:

Let’s test it on a collection of circles:

```
> resizeAll 2 [Circle 5 (1,2), Circle 4 (2,3)]
[Circle {circleRadius = 10.0, circleCenter = (1.0,2.0)}
,Circle {circleRadius = 8.0, circleCenter = (2.0,3.0)}]
```

Now on a collection of squares:

```
> resizeAll 2 [Square (1,2) 5, Square (2,3) 4]
[Square {squareTopLeft = (1.0,2.0), squareWidth = 10.0}
,Square {squareTopLeft = (2.0,3.0), squareWidth = 8.0}]
```

And finally on a collection mixing circles and squares:

```
> resizeAll 2 [Square (1,2) 5, Circle 4 (2,3)]
<interactive>:10:30: error:
• Couldn't match expected type ‘Square’ with actual type ‘Circle’
• In the expression: Circle 4 (2, 3)
In the second argument of ‘resizeAll’, namely
‘[Square (1, 2) 5, Circle 4 (2, 3)]’
In the expression: resizeAll 2 [Square (1, 2) 5, Circle 4 (2, 3)]
```

Argh! We forgot that lists can only contain elements having the same type! We would like an heterogeneous collection that can contain both squares and circles (and maybe more shapes)!

# Heterogeneous Collections

We can use the `Variant`

data type (or its alias `V`

) and the pattern synonym `V`

to wrap our values:

```
import Haskus.Utils.Variant
-- a collection of circles and squares
shapes :: [V '[Circle,Square]]
shapes = [ V $ Square (1,2) 5
, V $ Circle 4 (2,3)
, V $ Circle 1 (5,7)
]
data Triangle = Triangle deriving (Show)
-- a collection of circles, squares and triangles
shapes2 :: [V '[Circle,Square,Triangle]]
shapes2 = [ V $ Square (1,2) 5
, V $ Circle 4 (2,3)
, V $ Triangle
, V $ Circle 1 (5,7)
]
```

We can rewrite `resizeAll`

as follows:

```
resizeAllV :: AlterVariant Resizable cs => Float -> [V cs] -> [V cs]
resizeAllV f = fmap (alterVariant @Resizable (resize f))
```

It works on any list of variants as long as all the possible types in the heterogeneous list (i.e., types in `cs`

) have an associated `Resizable`

instance.

Let’s test it:

```
> shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}]
> resizeAllV 2 shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 10.0}
,Circle {circleRadius = 8.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 2.0, circleCenter = (5.0,7.0)}]
```

We can check that type-checking works as expected by using this function on `shapes2`

:

```
> resizeAllV 2 shapes2
<interactive>:3:1: error:
• No instance for (Resizable Triangle)
arising from a use of ‘resizeAllV’
• In the expression: resizeAllV 2 shapes2
In an equation for ‘it’: it = resizeAllV 2 shapes2
```

Indeed the type of `shapes2`

indicates that it may contain triangles but `Triangle`

has no `Resizable`

instance!

Another approach is to create an instance of `Resizable`

for any `Variant cs`

as long as every type in `cs`

has a `Resizable`

instance. It is defined recursively as follows:

```
-- this instance is never used but it is necessary for the type-checker
instance Resizable (V '[]) where
{-# INLINE resize #-}
resize = undefined
instance (Resizable (V xs), Resizable x) => Resizable (V (x ': xs)) where
{-# INLINE resize #-}
resize f v = case popVariantHead v of
Right x -> toVariantHead (resize f x)
Left xs -> toVariantTail (resize f xs)
```

We can now use the original `resizeAll`

function:

```
resizeAll :: Resizable a => Float -> [a] -> [a]
resizeAll f = fmap (resize f)
> resizeAll 3 shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 15.0}
,Circle {circleRadius = 12.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 3.0, circleCenter = (5.0,7.0)}
]
> resizeAll 3 shapes2
<interactive>:66:1: error:
• No instance for (Resizable Triangle)
arising from a use of ‘resizeAll’
• In the expression: resizeAll 3 shapes2
In an equation for ‘it’: it = resizeAll 3 shapes2
```

We use this mechanism to provide `Eq`

, `Ord`

and `Show`

instances. That’s how GHCI shows the variant contents.

We can easily compose functions by “composing” the constraints on the collection types:

```
import Data.Foldable (traverse_)
-- print an heterogeneous collection
printAll :: TraverseVariant Show cs IO => [V cs] -> IO ()
printAll = traverse_ (traverseVariant_ @Show print)
resizeThenPrint :: (AlterVariant Resizable cs, TraverseVariant Show cs IO)
=> Float -> [V cs] -> IO ()
resizeThenPrint f = printAll . resizeAllV f
> resizeThenPrint 2 shapes
Square {squareTopLeft = (1.0,2.0), squareWidth = 10.0}
Circle {circleRadius = 8.0, circleCenter = (2.0,3.0)}
Circle {circleRadius = 2.0, circleCenter = (5.0,7.0)}
```

We can map a function to modify only values having some given type:

```
translateCircle :: Float -> Float -> Circle -> Circle
translateCircle fx fy (Circle r (x,y)) = Circle r (fx+x,fy+y)
> fmap (mapVariant (translateCircle 50 70)) shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (52.0,73.0)}
,Circle {circleRadius = 1.0, circleCenter = (55.0,77.0)}
]
```

We can also map to a `Variant`

and fold the result:

```
transformCircle :: Circle -> V '[Square,Triangle]
transformCircle (Circle r xy)
| r > 3 = V (Square xy r)
| otherwise = V Triangle
> shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}]
> fmap (foldMapVariant transformCircle) shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Square {squareTopLeft = (2.0,3.0), squareWidth = 4.0}
,Triangle]
> :t fmap (foldMapVariant transformCircle) shapes
fmap (foldMapVariant transformCircle) shapes
:: [V '[Square, Triangle, Square]]
```

As you can see, it is possible to have the same type more than once in a `Variant`

. It is useful if you “select” types by index, e.g., `fromVariantAt @2 v`

(cf. `*At`

functions in Haddock).

You can also decide to keep a single entry per type with `nubVariant`

:

```
> :t fmap (nubVariant . foldMapVariant transformCircle) shapes
fmap (nubVariant . foldMapVariant transformCircle) shapes
:: [V '[Square, Triangle]]
```

You can also extend or reorder the type list of a `Variant`

with `liftVariant`

:

```
> shapes3 = fmap liftVariant shapes :: [V '[Square,Int,Circle,String]]
> shapes3 ++ [V "Hey!", V (10 :: Int)]
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}
,"Hey!"
,10
]
```

Obviously, we are not allowed to remove a type from the list (considered as a set):

```
> fmap liftVariant shapes :: [V '[Square,Int,String]]
<interactive>:112:6: error:
• `Circle' is not a member of '[Square, Int, String]
• In the first argument of ‘fmap’, namely ‘liftVariant’
In the expression:
fmap liftVariant shapes :: [V '[Square, Int, String]]
In an equation for ‘it’:
it = fmap liftVariant shapes :: [V '[Square, Int, String]]
<interactive>:112:6: error:
• '[Circle, Square]
is not a subset of
'[Square, Int, String]
• In the first argument of ‘fmap’, namely ‘liftVariant’
In the expression:
fmap liftVariant shapes :: [V '[Square, Int, String]]
In an equation for ‘it’:
it = fmap liftVariant shapes :: [V '[Square, Int, String]]
```

Let’s try to extract some shapes from the collection depending on their type:

```
import Data.Maybe (mapMaybe)
-- extract values having type "c" from the collection ("cs" must contain "c")
filterColl :: forall c cs. Popable c cs => [V cs] -> [c]
filterColl = mapMaybe (fromVariant @c)
> filterColl @Circle shapes
[Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}]
> filterColl @Square shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}]
> filterColl @Triangle shapes
<interactive>:44:1: error:
• `Triangle' is not a member of '[Circle, Square]
• In the expression: filterColl @Triangle shapes
In an equation for ‘it’: it = filterColl @Triangle shapes
```

Indeed no `Triangle`

can be in the `shapes`

collection as it is defined to hold only circles and squares.

We may want to write a generic function that filters any collection, even if we statically know that it can’t contain the given type. Let’s use `fromVariantMaybe`

function and `MaybePopable`

constraint for this:

```
-- extract values having type "c" from the collection ("cs" may not contain "c")
filterCollMaybe :: forall c cs. MaybePopable c cs => [V cs] -> [c]
filterCollMaybe = mapMaybe (fromVariantMaybe @c)
> filterCollMaybe @Triangle shapes
[]
> filterCollMaybe @Triangle shapes2
[Triangle]
> filterCollMaybe @Int shapes2
[]
```

Instead of keeping values having a given type, it may be more useful to filter out some types:

```
import Data.Either (partitionEithers)
removeAll :: forall c cs. MaybePopable c cs => [V cs] -> [V (Filter c cs)]
removeAll = fst . partitionEithers . fmap (popVariantMaybe @c)
> removeAll @Triangle shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}
]
> removeAll @Triangle shapes2
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}
]
> :t removeAll @Triangle shapes2
removeAll @Triangle shapes2 :: [V '[Circle, Square]] -- no more Triangle
> removeAll @Circle (removeAll @Triangle shapes2)
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}]
```

Now let’s try some pattern-matching:

```
matchShapes :: (Popable Circle cs
,Popable Square cs
,MaybePopable Triangle cs
) => V cs -> String
matchShapes = \case
V (Circle r _) -> "circle radius: " ++ show r
V (Square _ w) -> "square width: " ++ show w
VMaybe (x :: Triangle) -> "triangle: " ++ show x
_ -> "unknown shape"
> fmap matchShapes shapes
["square width: 5.0"
,"circle radius: 4.0"
,"circle radius: 1.0"
]
> fmap matchShapes shapes2
["square width: 5.0"
,"circle radius: 4.0"
,"triangle: Triangle"
,"circle radius: 1.0"
]
```

`V`

pattern synonym and`Popable`

constraint are used for types that**must be**storable in the variant.`VMaybe`

pattern synonym and`MaybePopable`

constraint are used for types that**may not be**storable in the variant.

Sadly when we use pattern matching on a `Variant`

, the compiler can’t detect if we match all the possible types in the `Variant`

. Hence we need a failover wildcard match to avoid a “non-exhaustive pattern match” warning:

```
matchShapes' :: V '[Circle,Square] -> String
matchShapes' = \case
V (Circle r _) -> "circle radius: " ++ show r
V (Square _ w) -> "square width: " ++ show w
_ -> "unknown shape" -- required useless failover match
```

Maybe we could circumvent this with a compiler plugin (left as an exercise for the motivated reader).

# Conclusion

I hope this introduction to the use of the `Haskus.Utils.Variant`

type has been interesting. You can find it in the `haskus-utils`

package either on GitHub or on Hackage. Feedback is welcome!

There was already a brief description of the use of the `Variant`

data type to handle heterogeneous collections on this blog buried at the end of this previous post about control-flow. Nevertheless, after reading this recent post about using existentials, `Typeable`

, `Dynamic`

and the like to manage heterogeneous collections in Haskell, I figured it deserved its own post. It led to a major API update and to a new release of `haskus-utils`

package: I should write posts more often.

The first drafts of this post started with a presentation of the usual approaches to heterogeneous collections (i.e., the state-of-the-art). It was a bit boring and many readers would probably have left without reaching the `Variant`

part, hence the state-of-the-art is now presented in the annexe of this post.

# Annexe: alternatives and comparison

The wiki lists several solutions to the “heterogeneous collections” problem:

- Tuples
- Algebraic datatypes
- Data.Dynamic
- Existential types
- HLists

To which we should add:

- recursive GADT
- Variant

Let’s compare them!

## Tuples

We don’t want a statically fixed length collection, hence we exclude tuples.

Constraint | Tuple |
---|---|

Non-fixed length | NO |

## Algebraic datatypes

The idea is to create a wrapper `Shape`

datatype:

We can write an instance:

```
instance Resizable Shape where
resize f (ShapeCircle c) = ShapeCircle (resize f c)
resize f (ShapeSquare s) = ShapeSquare (resize f s)
```

This is not easily extensible: if we want to support a new shape (e.g., a triangle), we have to modify both the `Shape`

datatype and its `Resizable`

instance:

```
data Shape
= ShapeCircle Circle
| ShapeSquare Square
| ShapeTriangle Triangle
instance Resizable Shape where
..
resize (ShapeTriangle ...) = ...
```

Constraint | Tuple | ADT |
---|---|---|

Non-fixed length | NO | - |

Easily data extensible | - | NO |

Now suppose that for some reason we want to statically ensure that a collection only contains shapes with a central symmetry (e.g., squares and circles but not triangles). The only way is to have different shape types for the different constraints on the shapes:

```
data CentralSymShape
= CentralSymCircle Circle
| CentralSymSquare Square
fooAll :: [CentralSymShape] -> [CentralSymShape]
```

But now we can’t resize elements of this collection anymore, so we need:

Now adding a new shape is even more cumbersome because we potentially have to modify several datatypes and their instances.

Constraint | Tuple | ADT |
---|---|---|

Non-fixed length | NO | - |

Easily data extensible | - | NO |

Easily support constraints on elements | - | NO |

## Existential types

We can use existential types to embed some type-classes alongside some values. For instance we could rewrite our previous datatype as follows:

```
data Shape = forall a. Resizable a => Shape a
instance Resizable Shape where
resize f (Shape a) = Shape (resize f a)
```

Constraint | Tuple | ADT | Existentials |
---|---|---|---|

Non-fixed length | NO | - | - |

Easily data extensible | - | NO | YES |

Easily support constraints on elements | - | NO | YES |

The problem is that we have lost the knowledge of the actual types in the collection: we only know that they are `Resizable`

. For instance we can’t easily write a function extracting the circles:

In addition we can’t easily reuse collections. Suppose we also have a type-class `Drawable`

:

```
class Drawable a where
draw :: a -> IO ()
instance Drawable Circle where ...
instance Drawable Square where ...
instance Drawable Triangle where ...
```

We could write a `drawAll`

function by using the same technique:

```
data DrawableShape = forall a. Drawable a => DrawableShape a
instance Drawable DrawableShape where
draw (DrawableShape a) = draw a
drawAll :: [DrawableShape] -> IO ()
drawAll = traverse_ draw
```

Now if we want to resize and then to draw a collection of shapes:

We need to introduce another wrapper datatype with its boilerplate:

```
data ResizableDrawableShape = forall a. (Drawable a, Resizable a) => ResizableDrawableShape a
instance Drawable ResizableDrawableShape where
draw (ResizableDrawableShape a) = draw a
instance Resizable ResizableDrawableShape where
resize f (ResizableDrawableShape a) = ResizableDrawableShape (resize f a)
resizeAndDraw :: Float -> [ResizableDrawableShape] -> IO ()
resizeAndDraw f = drawAll . resizeAll f
```

Hence we want to keep the knowledge of the types of the elements in the collection at runtime instead of using a different wrapper type per collection.

Constraint | Tuple | ADT | Existentials |
---|---|---|---|

Non-fixed length | NO | - | - |

Easily data extensible | - | NO | YES |

Easily support constraints on elements | - | NO | YES |

Keep knowledge of element types at runtime | - | YES | NO |

## Data.Dynamic

`Data.Dynamic`

can be used to wrap any value. It allows us to rewrite `resizeAll`

as follows:

```
resizeAll :: Float -> [Dynamic] -> [Dynamic]
resizeAll f = fmap (resizeDyn f)
resizeDyn :: Float -> Dynamic -> Dynamic
resizeDyn f d =
case fromDynamic d :: Maybe Circle of
Just c -> toDyn (resize f c)
Nothing -> case fromDynamic d :: Maybe Square of
Just s -> toDyn (resize f s)
Nothing -> error "Invalid shape"
```

The issue is that any datatype can be wrapped into a `Dynamic`

, hence we lose some type safety. We want to restrict the types of the collection members using some constraints (`Resizable`

, etc.).

Constraint | Tuple | ADT | Existentials | Dynamic |
---|---|---|---|---|

Non-fixed length | NO | - | - | - |

Easily data extensible | - | NO | YES | YES |

Easily support constraints on elements | - | NO | YES | NO |

Keep knowledge of element types at runtime | - | YES | NO | YES |

## HList

Instead of wrapping the elements of the collection, we could change the collection type to one that keeps track of the element types *at compile time*. For instance, HList. However the size of the “list” is fixed at compile time.

Constraint | Tuple | ADT | Existentials | Dynamic | HList |
---|---|---|---|---|---|

Non-fixed length at runtime | NO | - | - | - | NO |

Easily data extensible | - | NO | YES | YES | YES |

Easily support constraints on elements | - | NO | YES | NO | NO |

Keep knowledge of element types | - | YES | NO | YES | YES |

Easy to use | YES | YES | YES | YES | NO |

## Recursive GADT

The most basic sum type in Haskell is `Either a b`

. It can be used to wrap a value whose type is either `a`

or `b`

.

Constraint | Tuple | ADT | Existential | Dynamic | HList | Either |
---|---|---|---|---|---|---|

Non-fixed length at runtime | NO | - | - | - | NO | - |

Easily data extensible | - | NO | YES | YES | YES | NO |

Easily support constraints on elements | - | NO | YES | NO | NO | NO |

Keep knowledge of element types | - | YES | NO | YES | YES | YES |

Easy to use | YES | YES | YES | YES | NO | YES |

More than two element types | - | YES | YES | YES | YES | NO |

We can generalize it with a GADT such as:

```
data Union (as :: [*]) where
Union :: Either (Union as) a -> Union (a ': as)
-- other implementations are possible...
```

To make any union of resizable types resizable, we can write the following instances:

```
-- required boilerplate instance
instance Resizable (Union '[]) where
resize = undefined
instance (Resizable a, Resizable (Union as)) => Resizable (Union (a ': as)) where
resize f (Union (Right a)) = Union (Right (resize f a))
resize f (Union (Left u)) = Union (Left (resize f u))
```

The issue with this approach is that the actual type of an element is obtained by a linear traversal of the `Either`

constructor nest. Moreover, we need to store this constructor nest for each element.

Constraint | Tuple | ADT | Existential | Dynamic | HList | Either | GADT |
---|---|---|---|---|---|---|---|

Non-fixed length at runtime | NO | - | - | - | NO | - | - |

Easily data extensible | - | NO | YES | YES | YES | NO | YES |

Easily support constraints on elements | - | NO | YES | NO | NO | NO | YES |

Keep knowledge of element types | - | YES | NO | YES | YES | YES | YES |

Easy to use | YES | YES | YES | YES | NO | YES | YES |

More than two element types | - | YES | YES | YES | YES | NO | YES |

Efficient tag storage/obtention | - | YES | - | - | - | YES | NO |

## Variant

Could we avoid the suboptimal storage of the “tag” used to determine the actual element type with the GADT approach? It turns out we can: we just need to use a number similarly to the way ADT values are internally stored.

This is how the `Haskus.Utils.Variant`

type is defined in my haskus-utils package.

Using a list of `Variant`

as our heterogeneous collection, we finally get:

Constraint | Tuple | ADT | Existential | Dynamic | HList | Either | GADT | Variant |
---|---|---|---|---|---|---|---|---|

Non-fixed length at runtime | NO | - | - | - | NO | - | - | - |

Easily data extensible | - | NO | YES | YES | YES | NO | YES | YES |

Easily support constraints on elements | - | NO | YES | NO | NO | NO | YES | YES |

Keep knowledge of element types | - | YES | NO | YES | YES | YES | YES | YES |

Easy to use | YES | YES | YES | YES | NO | YES | YES | YES |

More than two element types | - | YES | YES | YES | YES | NO | YES | YES |

Efficient tag storage/obtention | - | YES | - | - | - | YES | NO | YES |

Performance of `Variant`

operations should be close to performance of ADTs with one indirection (i.e., constructors with a single (lifted) field). In a previous post, we saw that some operations such as `rewriteAll`

in the example above were almost compiled to a direct pattern matching on the variant tag with GHC 8.0.1. My patch should have made this even better in GHC 8.2.1 but sadly it isn’t the case because of #14170… I hope to get it fixed soon.