Heterogeneous collections
 Tags: Haskell, Variant, Heterogeneous collections February 21, 2018

This post is about managing collections (e.g., lists) of heterogeneous data types in Haskell.

Summary: in my opinion, using the data type I’ve called `Variant` to handle heterogeneous collections is currently the best alternative. It is type-safe, efficient (both storage-wise and performance-wise) and easy to use.

# The problem

Suppose we have several algebraic data types for different geometric shapes:

``````data Circle = Circle
{ circleRadius :: Float
, circleCenter :: (Float,Float)
}
deriving (Show)

data Square = Square
{ squareTopLeft :: (Float,Float)
, squareWidth   :: Float
}
deriving (Show)``````

and that we know how to resize them by a factor:

``````class Resizable a where
resize :: Float -> a -> a

instance Resizable Circle where
resize f (Circle r c) = Circle (f*r) c

instance Resizable Square where
resize f (Square tl w) = Square tl (f*w)``````

We can write a function to resize a list of shapes:

``````resizeAll :: Resizable a => Float -> [a] -> [a]
resizeAll f = fmap (resize f)``````

Let’s test it on a collection of circles:

``````> resizeAll 2 [Circle 5 (1,2), Circle 4 (2,3)]

[Circle {circleRadius = 10.0, circleCenter = (1.0,2.0)}
,Circle {circleRadius = 8.0, circleCenter = (2.0,3.0)}]``````

Now on a collection of squares:

``````> resizeAll 2 [Square (1,2) 5, Square (2,3) 4]

[Square {squareTopLeft = (1.0,2.0), squareWidth = 10.0}
,Square {squareTopLeft = (2.0,3.0), squareWidth = 8.0}]``````

And finally on a collection mixing circles and squares:

``````> resizeAll 2 [Square (1,2) 5, Circle 4 (2,3)]

<interactive>:10:30: error:
• Couldn't match expected type ‘Square’ with actual type ‘Circle’
• In the expression: Circle 4 (2, 3)
In the second argument of ‘resizeAll’, namely
‘[Square (1, 2) 5, Circle 4 (2, 3)]’
In the expression: resizeAll 2 [Square (1, 2) 5, Circle 4 (2, 3)]``````

Argh! We forgot that lists can only contain elements having the same type! We would like an heterogeneous collection that can contain both squares and circles (and maybe more shapes)!

# Heterogeneous Collections

We can use the `Variant` data type (or its alias `V`) and the pattern synonym `V` to wrap our values:

``````import Haskus.Utils.Variant

-- a collection of circles and squares
shapes :: [V '[Circle,Square]]
shapes = [ V \$ Square (1,2) 5
, V \$ Circle 4 (2,3)
, V \$ Circle 1 (5,7)
]

data Triangle = Triangle deriving (Show)

-- a collection of circles, squares and triangles
shapes2 :: [V '[Circle,Square,Triangle]]
shapes2 = [ V \$ Square (1,2) 5
, V \$ Circle 4 (2,3)
, V \$ Triangle
, V \$ Circle 1 (5,7)
]``````

We can rewrite `resizeAll` as follows:

``````resizeAllV :: AlterVariant Resizable cs => Float -> [V cs] -> [V cs]
resizeAllV f = fmap (alterVariant @Resizable (resize f))``````

It works on any list of variants as long as all the possible types in the heterogeneous list (i.e., types in `cs`) have an associated `Resizable` instance.

Let’s test it:

``````> shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}]

> resizeAllV 2 shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 10.0}
,Circle {circleRadius = 8.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 2.0, circleCenter = (5.0,7.0)}]``````

We can check that type-checking works as expected by using this function on `shapes2`:

``````> resizeAllV 2 shapes2

<interactive>:3:1: error:
• No instance for (Resizable Triangle)
arising from a use of ‘resizeAllV’
• In the expression: resizeAllV 2 shapes2
In an equation for ‘it’: it = resizeAllV 2 shapes2``````

Indeed the type of `shapes2` indicates that it may contain triangles but `Triangle` has no `Resizable` instance!

Another approach is to create an instance of `Resizable` for any `Variant cs` as long as every type in `cs` has a `Resizable` instance. It is defined recursively as follows:

``````-- this instance is never used but it is necessary for the type-checker
instance Resizable (V '[]) where
{-# INLINE resize #-}
resize = undefined

instance (Resizable (V xs), Resizable x) => Resizable (V (x ': xs)) where
{-# INLINE resize #-}
resize f v = case popVariantHead v of
Right x -> toVariantHead (resize f x)
Left xs -> toVariantTail (resize f xs)``````

We can now use the original `resizeAll` function:

``````resizeAll :: Resizable a => Float -> [a] -> [a]
resizeAll f = fmap (resize f)

> resizeAll 3 shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 15.0}
,Circle {circleRadius = 12.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 3.0, circleCenter = (5.0,7.0)}
]

> resizeAll 3 shapes2

<interactive>:66:1: error:
• No instance for (Resizable Triangle)
arising from a use of ‘resizeAll’
• In the expression: resizeAll 3 shapes2
In an equation for ‘it’: it = resizeAll 3 shapes2``````

We use this mechanism to provide `Eq`, `Ord` and `Show` instances. That’s how GHCI shows the variant contents.

We can easily compose functions by “composing” the constraints on the collection types:

``````import Data.Foldable (traverse_)

-- print an heterogeneous collection
printAll :: TraverseVariant Show cs IO => [V cs] -> IO ()
printAll = traverse_ (traverseVariant_ @Show print)

resizeThenPrint :: (AlterVariant Resizable cs, TraverseVariant Show cs IO)
=> Float -> [V cs] -> IO ()
resizeThenPrint f = printAll . resizeAllV f

> resizeThenPrint 2 shapes
Square {squareTopLeft = (1.0,2.0), squareWidth = 10.0}
Circle {circleRadius = 8.0, circleCenter = (2.0,3.0)}
Circle {circleRadius = 2.0, circleCenter = (5.0,7.0)}``````

We can map a function to modify only values having some given type:

``````translateCircle :: Float -> Float -> Circle -> Circle
translateCircle fx fy (Circle r (x,y)) = Circle r (fx+x,fy+y)

> fmap (mapVariant (translateCircle 50 70)) shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (52.0,73.0)}
,Circle {circleRadius = 1.0, circleCenter = (55.0,77.0)}
]``````

We can also map to a `Variant` and fold the result:

``````transformCircle :: Circle -> V '[Square,Triangle]
transformCircle (Circle r xy)
| r > 3     = V (Square xy r)
| otherwise = V Triangle

> shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}]

> fmap (foldMapVariant transformCircle) shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Square {squareTopLeft = (2.0,3.0), squareWidth = 4.0}
,Triangle]

> :t fmap (foldMapVariant transformCircle) shapes
fmap (foldMapVariant transformCircle) shapes
:: [V '[Square, Triangle, Square]]``````

As you can see, it is possible to have the same type more than once in a `Variant`. It is useful if you “select” types by index, e.g., `fromVariantAt @2 v` (cf. `*At` functions in Haddock).

You can also decide to keep a single entry per type with `nubVariant`:

``````> :t fmap (nubVariant . foldMapVariant transformCircle) shapes
fmap (nubVariant . foldMapVariant transformCircle) shapes
:: [V '[Square, Triangle]]``````

You can also extend or reorder the type list of a `Variant` with `liftVariant`:

``````> shapes3 = fmap liftVariant shapes :: [V '[Square,Int,Circle,String]]

> shapes3 ++ [V "Hey!", V (10 :: Int)]
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}
,"Hey!"
,10
]``````

Obviously, we are not allowed to remove a type from the list (considered as a set):

``````> fmap liftVariant shapes :: [V '[Square,Int,String]]

<interactive>:112:6: error:
• `Circle' is not a member of '[Square, Int, String]
• In the first argument of ‘fmap’, namely ‘liftVariant’
In the expression:
fmap liftVariant shapes :: [V '[Square, Int, String]]
In an equation for ‘it’:
it = fmap liftVariant shapes :: [V '[Square, Int, String]]

<interactive>:112:6: error:
• '[Circle, Square]
is not a subset of
'[Square, Int, String]
• In the first argument of ‘fmap’, namely ‘liftVariant’
In the expression:
fmap liftVariant shapes :: [V '[Square, Int, String]]
In an equation for ‘it’:
it = fmap liftVariant shapes :: [V '[Square, Int, String]]``````

Let’s try to extract some shapes from the collection depending on their type:

``````import Data.Maybe (mapMaybe)

-- extract values having type "c" from the collection ("cs" must contain "c")
filterColl :: forall c cs. Popable c cs => [V cs] -> [c]
filterColl = mapMaybe (fromVariant @c)

> filterColl @Circle shapes
[Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}]

> filterColl @Square shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}]

> filterColl @Triangle shapes

<interactive>:44:1: error:
• `Triangle' is not a member of '[Circle, Square]
• In the expression: filterColl @Triangle shapes
In an equation for ‘it’: it = filterColl @Triangle shapes``````

Indeed no `Triangle` can be in the `shapes` collection as it is defined to hold only circles and squares.

We may want to write a generic function that filters any collection, even if we statically know that it can’t contain the given type. Let’s use `fromVariantMaybe` function and `MaybePopable` constraint for this:

``````-- extract values having type "c" from the collection ("cs" may not contain "c")
filterCollMaybe :: forall c cs. MaybePopable c cs => [V cs] -> [c]
filterCollMaybe = mapMaybe (fromVariantMaybe @c)

> filterCollMaybe @Triangle shapes
[]
> filterCollMaybe @Triangle shapes2
[Triangle]
> filterCollMaybe @Int shapes2
[]``````

Instead of keeping values having a given type, it may be more useful to filter out some types:

``````import Data.Either (partitionEithers)

removeAll :: forall c cs. MaybePopable c cs => [V cs] -> [V (Filter c cs)]
removeAll = fst . partitionEithers . fmap (popVariantMaybe @c)

> removeAll @Triangle shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}
]

> removeAll @Triangle shapes2
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}
]

> :t removeAll @Triangle shapes2
removeAll @Triangle shapes2 :: [V '[Circle, Square]] -- no more Triangle

> removeAll @Circle (removeAll @Triangle shapes2)
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}]``````

Now let’s try some pattern-matching:

``````matchShapes :: (Popable Circle cs
,Popable Square cs
,MaybePopable Triangle cs
) => V cs -> String
matchShapes = \case
V (Circle r _)         -> "circle radius: " ++ show r
V (Square _ w)         -> "square width: " ++ show w
VMaybe (x :: Triangle) -> "triangle: " ++ show x
_                      -> "unknown shape"

> fmap matchShapes shapes
["square width: 5.0"
]

> fmap matchShapes shapes2
["square width: 5.0"
,"triangle: Triangle"
]``````
• `V` pattern synonym and `Popable` constraint are used for types that must be storable in the variant.
• `VMaybe` pattern synonym and `MaybePopable` constraint are used for types that may not be storable in the variant.

Sadly when we use pattern matching on a `Variant`, the compiler can’t detect if we match all the possible types in the `Variant`. Hence we need a failover wildcard match to avoid a “non-exhaustive pattern match” warning:

``````matchShapes' :: V '[Circle,Square] -> String
matchShapes' = \case
V (Circle r _)         -> "circle radius: " ++ show r
V (Square _ w)         -> "square width: " ++ show w
_                      -> "unknown shape" -- required useless failover match``````

Maybe we could circumvent this with a compiler plugin (left as an exercise for the motivated reader).

# Conclusion

I hope this introduction to the use of the `Haskus.Utils.Variant` type has been interesting. You can find it in the `haskus-utils` package either on GitHub or on Hackage. Feedback is welcome!

There was already a brief description of the use of the `Variant` data type to handle heterogeneous collections on this blog buried at the end of this previous post about control-flow. Nevertheless, after reading this recent post about using existentials, `Typeable`, `Dynamic` and the like to manage heterogeneous collections in Haskell, I figured it deserved its own post. It led to a major API update and to a new release of `haskus-utils` package: I should write posts more often.

The first drafts of this post started with a presentation of the usual approaches to heterogeneous collections (i.e., the state-of-the-art). It was a bit boring and many readers would probably have left without reaching the `Variant` part, hence the state-of-the-art is now presented in the annexe of this post.

# Annexe: alternatives and comparison

The wiki lists several solutions to the “heterogeneous collections” problem:

• Tuples
• Algebraic datatypes
• Data.Dynamic
• Existential types
• HLists

To which we should add:

• Variant

Let’s compare them!

## Tuples

We don’t want a statically fixed length collection, hence we exclude tuples.

Constraint Tuple
Non-fixed length NO

## Algebraic datatypes

The idea is to create a wrapper `Shape` datatype:

``````data Shape
= ShapeCircle Circle
| ShapeSquare Square``````

We can write an instance:

``````instance Resizable Shape where
resize f (ShapeCircle c) = ShapeCircle (resize f c)
resize f (ShapeSquare s) = ShapeSquare (resize f s)``````

This is not easily extensible: if we want to support a new shape (e.g., a triangle), we have to modify both the `Shape` datatype and its `Resizable` instance:

``````data Shape
= ShapeCircle Circle
| ShapeSquare Square
| ShapeTriangle Triangle

instance Resizable Shape where
..
resize (ShapeTriangle ...) = ...``````
Non-fixed length NO -
Easily data extensible - NO

Now suppose that for some reason we want to statically ensure that a collection only contains shapes with a central symmetry (e.g., squares and circles but not triangles). The only way is to have different shape types for the different constraints on the shapes:

``````data CentralSymShape
= CentralSymCircle Circle
| CentralSymSquare Square

fooAll :: [CentralSymShape] -> [CentralSymShape]``````

But now we can’t resize elements of this collection anymore, so we need:

``````instance Resizable CentralSymShape where
...``````

Now adding a new shape is even more cumbersome because we potentially have to modify several datatypes and their instances.

Non-fixed length NO -
Easily data extensible - NO
Easily support constraints on elements - NO

## Existential types

We can use existential types to embed some type-classes alongside some values. For instance we could rewrite our previous datatype as follows:

``````data Shape = forall a. Resizable a => Shape a

instance Resizable Shape where
resize f (Shape a) = Shape (resize f a)``````
Constraint Tuple ADT Existentials
Non-fixed length NO - -
Easily data extensible - NO YES
Easily support constraints on elements - NO YES

The problem is that we have lost the knowledge of the actual types in the collection: we only know that they are `Resizable`. For instance we can’t easily write a function extracting the circles:

``````extractCircles :: [Shape] -> [Circle]
extractCircles xs = ???``````

In addition we can’t easily reuse collections. Suppose we also have a type-class `Drawable`:

``````class Drawable a where
draw :: a -> IO ()

instance Drawable Circle where ...
instance Drawable Square where ...
instance Drawable Triangle where ...``````

We could write a `drawAll` function by using the same technique:

``````data DrawableShape = forall a. Drawable a => DrawableShape a

instance Drawable DrawableShape where
draw (DrawableShape a) = draw a

drawAll :: [DrawableShape] -> IO ()
drawAll = traverse_ draw``````

Now if we want to resize and then to draw a collection of shapes:

``````resizeAndDraw :: Float -> [?] -> IO ()
resizeAndDraw f = drawAll . resizeAll f``````

We need to introduce another wrapper datatype with its boilerplate:

``````data ResizableDrawableShape = forall a. (Drawable a, Resizable a) => ResizableDrawableShape a

instance Drawable ResizableDrawableShape where
draw (ResizableDrawableShape a) = draw a

instance Resizable ResizableDrawableShape where
resize f (ResizableDrawableShape a) = ResizableDrawableShape (resize f a)

resizeAndDraw :: Float -> [ResizableDrawableShape] -> IO ()
resizeAndDraw f = drawAll . resizeAll f``````

Hence we want to keep the knowledge of the types of the elements in the collection at runtime instead of using a different wrapper type per collection.

Constraint Tuple ADT Existentials
Non-fixed length NO - -
Easily data extensible - NO YES
Easily support constraints on elements - NO YES
Keep knowledge of element types at runtime - YES NO

## Data.Dynamic

`Data.Dynamic` can be used to wrap any value. It allows us to rewrite `resizeAll` as follows:

``````resizeAll :: Float -> [Dynamic] -> [Dynamic]
resizeAll f = fmap (resizeDyn f)

resizeDyn :: Float -> Dynamic -> Dynamic
resizeDyn f d =
case fromDynamic d :: Maybe Circle of
Just c  -> toDyn (resize f c)
Nothing -> case fromDynamic d :: Maybe Square of
Just s  -> toDyn (resize f s)
Nothing -> error "Invalid shape"``````

The issue is that any datatype can be wrapped into a `Dynamic`, hence we lose some type safety. We want to restrict the types of the collection members using some constraints (`Resizable`, etc.).

Constraint Tuple ADT Existentials Dynamic
Non-fixed length NO - - -
Easily data extensible - NO YES YES
Easily support constraints on elements - NO YES NO
Keep knowledge of element types at runtime - YES NO YES

## HList

Instead of wrapping the elements of the collection, we could change the collection type to one that keeps track of the element types at compile time. For instance, HList. However the size of the “list” is fixed at compile time.

Constraint Tuple ADT Existentials Dynamic HList
Non-fixed length at runtime NO - - - NO
Easily data extensible - NO YES YES YES
Easily support constraints on elements - NO YES NO NO
Keep knowledge of element types - YES NO YES YES
Easy to use YES YES YES YES NO

The most basic sum type in Haskell is `Either a b`. It can be used to wrap a value whose type is either `a` or `b`.

Constraint Tuple ADT Existential Dynamic HList Either
Non-fixed length at runtime NO - - - NO -
Easily data extensible - NO YES YES YES NO
Easily support constraints on elements - NO YES NO NO NO
Keep knowledge of element types - YES NO YES YES YES
Easy to use YES YES YES YES NO YES
More than two element types - YES YES YES YES NO

We can generalize it with a GADT such as:

``````data Union (as :: [*]) where
Union :: Either (Union as) a -> Union (a ': as)
-- other implementations are possible...``````

To make any union of resizable types resizable, we can write the following instances:

``````-- required boilerplate instance
instance Resizable (Union '[]) where
resize = undefined

instance (Resizable a, Resizable (Union as)) => Resizable (Union (a ': as)) where
resize f (Union (Right a)) = Union (Right (resize f a))
resize f (Union (Left u))  = Union (Left (resize f u))``````

The issue with this approach is that the actual type of an element is obtained by a linear traversal of the `Either` constructor nest. Moreover, we need to store this constructor nest for each element.

Constraint Tuple ADT Existential Dynamic HList Either GADT
Non-fixed length at runtime NO - - - NO - -
Easily data extensible - NO YES YES YES NO YES
Easily support constraints on elements - NO YES NO NO NO YES
Keep knowledge of element types - YES NO YES YES YES YES
Easy to use YES YES YES YES NO YES YES
More than two element types - YES YES YES YES NO YES
Efficient tag storage/obtention - YES - - - YES NO

## Variant

Could we avoid the suboptimal storage of the “tag” used to determine the actual element type with the GADT approach? It turns out we can: we just need to use a number similarly to the way ADT values are internally stored.

``````data Variant (types :: [*]) = Variant {-# UNPACK #-} !Word Any

type role Variant representational``````

This is how the `Haskus.Utils.Variant` type is defined in my haskus-utils package.

Using a list of `Variant` as our heterogeneous collection, we finally get:

Constraint Tuple ADT Existential Dynamic HList Either GADT Variant
Non-fixed length at runtime NO - - - NO - - -
Easily data extensible - NO YES YES YES NO YES YES
Easily support constraints on elements - NO YES NO NO NO YES YES
Keep knowledge of element types - YES NO YES YES YES YES YES
Easy to use YES YES YES YES NO YES YES YES
More than two element types - YES YES YES YES NO YES YES
Efficient tag storage/obtention - YES - - - YES NO YES

Performance of `Variant` operations should be close to performance of ADTs with one indirection (i.e., constructors with a single (lifted) field). In a previous post, we saw that some operations such as `rewriteAll` in the example above were almost compiled to a direct pattern matching on the variant tag with GHC 8.0.1. My patch should have made this even better in GHC 8.2.1 but sadly it isn’t the case because of #14170… I hope to get it fixed soon.