Tags: Haskell, Variant, Heterogeneous collections | February 21, 2018 |
This post is about managing collections (e.g., lists) of heterogeneous data types in Haskell.
Summary: in my opinion, using the data type I’ve called Variant
to handle heterogeneous collections is currently the best alternative. It is type-safe, efficient (both storage-wise and performance-wise) and easy to use.
The problem
Suppose we have several algebraic data types for different geometric shapes:
data Circle = Circle
{ circleRadius :: Float
, circleCenter :: (Float,Float)
}
deriving (Show)
data Square = Square
{ squareTopLeft :: (Float,Float)
, squareWidth :: Float
}
deriving (Show)
and that we know how to resize them by a factor:
class Resizable a where
resize :: Float -> a -> a
instance Resizable Circle where
resize f (Circle r c) = Circle (f*r) c
instance Resizable Square where
resize f (Square tl w) = Square tl (f*w)
We can write a function to resize a list of shapes:
Let’s test it on a collection of circles:
> resizeAll 2 [Circle 5 (1,2), Circle 4 (2,3)]
[Circle {circleRadius = 10.0, circleCenter = (1.0,2.0)}
,Circle {circleRadius = 8.0, circleCenter = (2.0,3.0)}]
Now on a collection of squares:
> resizeAll 2 [Square (1,2) 5, Square (2,3) 4]
[Square {squareTopLeft = (1.0,2.0), squareWidth = 10.0}
,Square {squareTopLeft = (2.0,3.0), squareWidth = 8.0}]
And finally on a collection mixing circles and squares:
> resizeAll 2 [Square (1,2) 5, Circle 4 (2,3)]
<interactive>:10:30: error:
• Couldn't match expected type ‘Square’ with actual type ‘Circle’
• In the expression: Circle 4 (2, 3)
In the second argument of ‘resizeAll’, namely
‘[Square (1, 2) 5, Circle 4 (2, 3)]’
In the expression: resizeAll 2 [Square (1, 2) 5, Circle 4 (2, 3)]
Argh! We forgot that lists can only contain elements having the same type! We would like an heterogeneous collection that can contain both squares and circles (and maybe more shapes)!
Heterogeneous Collections
We can use the Variant
data type (or its alias V
) and the pattern synonym V
to wrap our values:
import Haskus.Utils.Variant
-- a collection of circles and squares
shapes :: [V '[Circle,Square]]
shapes = [ V $ Square (1,2) 5
, V $ Circle 4 (2,3)
, V $ Circle 1 (5,7)
]
data Triangle = Triangle deriving (Show)
-- a collection of circles, squares and triangles
shapes2 :: [V '[Circle,Square,Triangle]]
shapes2 = [ V $ Square (1,2) 5
, V $ Circle 4 (2,3)
, V $ Triangle
, V $ Circle 1 (5,7)
]
We can rewrite resizeAll
as follows:
resizeAllV :: AlterVariant Resizable cs => Float -> [V cs] -> [V cs]
resizeAllV f = fmap (alterVariant @Resizable (resize f))
It works on any list of variants as long as all the possible types in the heterogeneous list (i.e., types in cs
) have an associated Resizable
instance.
Let’s test it:
> shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}]
> resizeAllV 2 shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 10.0}
,Circle {circleRadius = 8.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 2.0, circleCenter = (5.0,7.0)}]
We can check that type-checking works as expected by using this function on shapes2
:
> resizeAllV 2 shapes2
<interactive>:3:1: error:
• No instance for (Resizable Triangle)
arising from a use of ‘resizeAllV’
• In the expression: resizeAllV 2 shapes2
In an equation for ‘it’: it = resizeAllV 2 shapes2
Indeed the type of shapes2
indicates that it may contain triangles but Triangle
has no Resizable
instance!
Another approach is to create an instance of Resizable
for any Variant cs
as long as every type in cs
has a Resizable
instance. It is defined recursively as follows:
-- this instance is never used but it is necessary for the type-checker
instance Resizable (V '[]) where
{-# INLINE resize #-}
resize = undefined
instance (Resizable (V xs), Resizable x) => Resizable (V (x ': xs)) where
{-# INLINE resize #-}
resize f v = case popVariantHead v of
Right x -> toVariantHead (resize f x)
Left xs -> toVariantTail (resize f xs)
We can now use the original resizeAll
function:
resizeAll :: Resizable a => Float -> [a] -> [a]
resizeAll f = fmap (resize f)
> resizeAll 3 shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 15.0}
,Circle {circleRadius = 12.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 3.0, circleCenter = (5.0,7.0)}
]
> resizeAll 3 shapes2
<interactive>:66:1: error:
• No instance for (Resizable Triangle)
arising from a use of ‘resizeAll’
• In the expression: resizeAll 3 shapes2
In an equation for ‘it’: it = resizeAll 3 shapes2
We use this mechanism to provide Eq
, Ord
and Show
instances. That’s how GHCI shows the variant contents.
We can easily compose functions by “composing” the constraints on the collection types:
import Data.Foldable (traverse_)
-- print an heterogeneous collection
printAll :: TraverseVariant Show cs IO => [V cs] -> IO ()
printAll = traverse_ (traverseVariant_ @Show print)
resizeThenPrint :: (AlterVariant Resizable cs, TraverseVariant Show cs IO)
=> Float -> [V cs] -> IO ()
resizeThenPrint f = printAll . resizeAllV f
> resizeThenPrint 2 shapes
Square {squareTopLeft = (1.0,2.0), squareWidth = 10.0}
Circle {circleRadius = 8.0, circleCenter = (2.0,3.0)}
Circle {circleRadius = 2.0, circleCenter = (5.0,7.0)}
We can map a function to modify only values having some given type:
translateCircle :: Float -> Float -> Circle -> Circle
translateCircle fx fy (Circle r (x,y)) = Circle r (fx+x,fy+y)
> fmap (mapVariant (translateCircle 50 70)) shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (52.0,73.0)}
,Circle {circleRadius = 1.0, circleCenter = (55.0,77.0)}
]
We can also map to a Variant
and fold the result:
transformCircle :: Circle -> V '[Square,Triangle]
transformCircle (Circle r xy)
| r > 3 = V (Square xy r)
| otherwise = V Triangle
> shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}]
> fmap (foldMapVariant transformCircle) shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Square {squareTopLeft = (2.0,3.0), squareWidth = 4.0}
,Triangle]
> :t fmap (foldMapVariant transformCircle) shapes
fmap (foldMapVariant transformCircle) shapes
:: [V '[Square, Triangle, Square]]
As you can see, it is possible to have the same type more than once in a Variant
. It is useful if you “select” types by index, e.g., fromVariantAt @2 v
(cf. *At
functions in Haddock).
You can also decide to keep a single entry per type with nubVariant
:
> :t fmap (nubVariant . foldMapVariant transformCircle) shapes
fmap (nubVariant . foldMapVariant transformCircle) shapes
:: [V '[Square, Triangle]]
You can also extend or reorder the type list of a Variant
with liftVariant
:
> shapes3 = fmap liftVariant shapes :: [V '[Square,Int,Circle,String]]
> shapes3 ++ [V "Hey!", V (10 :: Int)]
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}
,"Hey!"
,10
]
Obviously, we are not allowed to remove a type from the list (considered as a set):
> fmap liftVariant shapes :: [V '[Square,Int,String]]
<interactive>:112:6: error:
• `Circle' is not a member of '[Square, Int, String]
• In the first argument of ‘fmap’, namely ‘liftVariant’
In the expression:
fmap liftVariant shapes :: [V '[Square, Int, String]]
In an equation for ‘it’:
it = fmap liftVariant shapes :: [V '[Square, Int, String]]
<interactive>:112:6: error:
• '[Circle, Square]
is not a subset of
'[Square, Int, String]
• In the first argument of ‘fmap’, namely ‘liftVariant’
In the expression:
fmap liftVariant shapes :: [V '[Square, Int, String]]
In an equation for ‘it’:
it = fmap liftVariant shapes :: [V '[Square, Int, String]]
Let’s try to extract some shapes from the collection depending on their type:
import Data.Maybe (mapMaybe)
-- extract values having type "c" from the collection ("cs" must contain "c")
filterColl :: forall c cs. Popable c cs => [V cs] -> [c]
filterColl = mapMaybe (fromVariant @c)
> filterColl @Circle shapes
[Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}]
> filterColl @Square shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}]
> filterColl @Triangle shapes
<interactive>:44:1: error:
• `Triangle' is not a member of '[Circle, Square]
• In the expression: filterColl @Triangle shapes
In an equation for ‘it’: it = filterColl @Triangle shapes
Indeed no Triangle
can be in the shapes
collection as it is defined to hold only circles and squares.
We may want to write a generic function that filters any collection, even if we statically know that it can’t contain the given type. Let’s use fromVariantMaybe
function and MaybePopable
constraint for this:
-- extract values having type "c" from the collection ("cs" may not contain "c")
filterCollMaybe :: forall c cs. MaybePopable c cs => [V cs] -> [c]
filterCollMaybe = mapMaybe (fromVariantMaybe @c)
> filterCollMaybe @Triangle shapes
[]
> filterCollMaybe @Triangle shapes2
[Triangle]
> filterCollMaybe @Int shapes2
[]
Instead of keeping values having a given type, it may be more useful to filter out some types:
import Data.Either (partitionEithers)
removeAll :: forall c cs. MaybePopable c cs => [V cs] -> [V (Filter c cs)]
removeAll = fst . partitionEithers . fmap (popVariantMaybe @c)
> removeAll @Triangle shapes
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}
]
> removeAll @Triangle shapes2
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}
,Circle {circleRadius = 4.0, circleCenter = (2.0,3.0)}
,Circle {circleRadius = 1.0, circleCenter = (5.0,7.0)}
]
> :t removeAll @Triangle shapes2
removeAll @Triangle shapes2 :: [V '[Circle, Square]] -- no more Triangle
> removeAll @Circle (removeAll @Triangle shapes2)
[Square {squareTopLeft = (1.0,2.0), squareWidth = 5.0}]
Now let’s try some pattern-matching:
matchShapes :: (Popable Circle cs
,Popable Square cs
,MaybePopable Triangle cs
) => V cs -> String
matchShapes = \case
V (Circle r _) -> "circle radius: " ++ show r
V (Square _ w) -> "square width: " ++ show w
VMaybe (x :: Triangle) -> "triangle: " ++ show x
_ -> "unknown shape"
> fmap matchShapes shapes
["square width: 5.0"
,"circle radius: 4.0"
,"circle radius: 1.0"
]
> fmap matchShapes shapes2
["square width: 5.0"
,"circle radius: 4.0"
,"triangle: Triangle"
,"circle radius: 1.0"
]
V
pattern synonym andPopable
constraint are used for types that must be storable in the variant.VMaybe
pattern synonym andMaybePopable
constraint are used for types that may not be storable in the variant.
Sadly when we use pattern matching on a Variant
, the compiler can’t detect if we match all the possible types in the Variant
. Hence we need a failover wildcard match to avoid a “non-exhaustive pattern match” warning:
matchShapes' :: V '[Circle,Square] -> String
matchShapes' = \case
V (Circle r _) -> "circle radius: " ++ show r
V (Square _ w) -> "square width: " ++ show w
_ -> "unknown shape" -- required useless failover match
Maybe we could circumvent this with a compiler plugin (left as an exercise for the motivated reader).
Conclusion
I hope this introduction to the use of the Haskus.Utils.Variant
type has been interesting. You can find it in the haskus-utils
package either on GitHub or on Hackage. Feedback is welcome!
There was already a brief description of the use of the Variant
data type to handle heterogeneous collections on this blog buried at the end of this previous post about control-flow. Nevertheless, after reading this recent post about using existentials, Typeable
, Dynamic
and the like to manage heterogeneous collections in Haskell, I figured it deserved its own post. It led to a major API update and to a new release of haskus-utils
package: I should write posts more often.
The first drafts of this post started with a presentation of the usual approaches to heterogeneous collections (i.e., the state-of-the-art). It was a bit boring and many readers would probably have left without reaching the Variant
part, hence the state-of-the-art is now presented in the annexe of this post.
Annexe: alternatives and comparison
The wiki lists several solutions to the “heterogeneous collections” problem:
- Tuples
- Algebraic datatypes
- Data.Dynamic
- Existential types
- HLists
To which we should add:
- recursive GADT
- Variant
Let’s compare them!
Tuples
We don’t want a statically fixed length collection, hence we exclude tuples.
Constraint | Tuple |
---|---|
Non-fixed length | NO |
Algebraic datatypes
The idea is to create a wrapper Shape
datatype:
We can write an instance:
instance Resizable Shape where
resize f (ShapeCircle c) = ShapeCircle (resize f c)
resize f (ShapeSquare s) = ShapeSquare (resize f s)
This is not easily extensible: if we want to support a new shape (e.g., a triangle), we have to modify both the Shape
datatype and its Resizable
instance:
data Shape
= ShapeCircle Circle
| ShapeSquare Square
| ShapeTriangle Triangle
instance Resizable Shape where
..
resize (ShapeTriangle ...) = ...
Constraint | Tuple | ADT |
---|---|---|
Non-fixed length | NO | - |
Easily data extensible | - | NO |
Now suppose that for some reason we want to statically ensure that a collection only contains shapes with a central symmetry (e.g., squares and circles but not triangles). The only way is to have different shape types for the different constraints on the shapes:
data CentralSymShape
= CentralSymCircle Circle
| CentralSymSquare Square
fooAll :: [CentralSymShape] -> [CentralSymShape]
But now we can’t resize elements of this collection anymore, so we need:
Now adding a new shape is even more cumbersome because we potentially have to modify several datatypes and their instances.
Constraint | Tuple | ADT |
---|---|---|
Non-fixed length | NO | - |
Easily data extensible | - | NO |
Easily support constraints on elements | - | NO |
Existential types
We can use existential types to embed some type-classes alongside some values. For instance we could rewrite our previous datatype as follows:
data Shape = forall a. Resizable a => Shape a
instance Resizable Shape where
resize f (Shape a) = Shape (resize f a)
Constraint | Tuple | ADT | Existentials |
---|---|---|---|
Non-fixed length | NO | - | - |
Easily data extensible | - | NO | YES |
Easily support constraints on elements | - | NO | YES |
The problem is that we have lost the knowledge of the actual types in the collection: we only know that they are Resizable
. For instance we can’t easily write a function extracting the circles:
In addition we can’t easily reuse collections. Suppose we also have a type-class Drawable
:
class Drawable a where
draw :: a -> IO ()
instance Drawable Circle where ...
instance Drawable Square where ...
instance Drawable Triangle where ...
We could write a drawAll
function by using the same technique:
data DrawableShape = forall a. Drawable a => DrawableShape a
instance Drawable DrawableShape where
draw (DrawableShape a) = draw a
drawAll :: [DrawableShape] -> IO ()
drawAll = traverse_ draw
Now if we want to resize and then to draw a collection of shapes:
We need to introduce another wrapper datatype with its boilerplate:
data ResizableDrawableShape = forall a. (Drawable a, Resizable a) => ResizableDrawableShape a
instance Drawable ResizableDrawableShape where
draw (ResizableDrawableShape a) = draw a
instance Resizable ResizableDrawableShape where
resize f (ResizableDrawableShape a) = ResizableDrawableShape (resize f a)
resizeAndDraw :: Float -> [ResizableDrawableShape] -> IO ()
resizeAndDraw f = drawAll . resizeAll f
Hence we want to keep the knowledge of the types of the elements in the collection at runtime instead of using a different wrapper type per collection.
Constraint | Tuple | ADT | Existentials |
---|---|---|---|
Non-fixed length | NO | - | - |
Easily data extensible | - | NO | YES |
Easily support constraints on elements | - | NO | YES |
Keep knowledge of element types at runtime | - | YES | NO |
Data.Dynamic
Data.Dynamic
can be used to wrap any value. It allows us to rewrite resizeAll
as follows:
resizeAll :: Float -> [Dynamic] -> [Dynamic]
resizeAll f = fmap (resizeDyn f)
resizeDyn :: Float -> Dynamic -> Dynamic
resizeDyn f d =
case fromDynamic d :: Maybe Circle of
Just c -> toDyn (resize f c)
Nothing -> case fromDynamic d :: Maybe Square of
Just s -> toDyn (resize f s)
Nothing -> error "Invalid shape"
The issue is that any datatype can be wrapped into a Dynamic
, hence we lose some type safety. We want to restrict the types of the collection members using some constraints (Resizable
, etc.).
Constraint | Tuple | ADT | Existentials | Dynamic |
---|---|---|---|---|
Non-fixed length | NO | - | - | - |
Easily data extensible | - | NO | YES | YES |
Easily support constraints on elements | - | NO | YES | NO |
Keep knowledge of element types at runtime | - | YES | NO | YES |
HList
Instead of wrapping the elements of the collection, we could change the collection type to one that keeps track of the element types at compile time. For instance, HList. However the size of the “list” is fixed at compile time.
Constraint | Tuple | ADT | Existentials | Dynamic | HList |
---|---|---|---|---|---|
Non-fixed length at runtime | NO | - | - | - | NO |
Easily data extensible | - | NO | YES | YES | YES |
Easily support constraints on elements | - | NO | YES | NO | NO |
Keep knowledge of element types | - | YES | NO | YES | YES |
Easy to use | YES | YES | YES | YES | NO |
Recursive GADT
The most basic sum type in Haskell is Either a b
. It can be used to wrap a value whose type is either a
or b
.
Constraint | Tuple | ADT | Existential | Dynamic | HList | Either |
---|---|---|---|---|---|---|
Non-fixed length at runtime | NO | - | - | - | NO | - |
Easily data extensible | - | NO | YES | YES | YES | NO |
Easily support constraints on elements | - | NO | YES | NO | NO | NO |
Keep knowledge of element types | - | YES | NO | YES | YES | YES |
Easy to use | YES | YES | YES | YES | NO | YES |
More than two element types | - | YES | YES | YES | YES | NO |
We can generalize it with a GADT such as:
data Union (as :: [*]) where
Union :: Either (Union as) a -> Union (a ': as)
-- other implementations are possible...
To make any union of resizable types resizable, we can write the following instances:
-- required boilerplate instance
instance Resizable (Union '[]) where
resize = undefined
instance (Resizable a, Resizable (Union as)) => Resizable (Union (a ': as)) where
resize f (Union (Right a)) = Union (Right (resize f a))
resize f (Union (Left u)) = Union (Left (resize f u))
The issue with this approach is that the actual type of an element is obtained by a linear traversal of the Either
constructor nest. Moreover, we need to store this constructor nest for each element.
Constraint | Tuple | ADT | Existential | Dynamic | HList | Either | GADT |
---|---|---|---|---|---|---|---|
Non-fixed length at runtime | NO | - | - | - | NO | - | - |
Easily data extensible | - | NO | YES | YES | YES | NO | YES |
Easily support constraints on elements | - | NO | YES | NO | NO | NO | YES |
Keep knowledge of element types | - | YES | NO | YES | YES | YES | YES |
Easy to use | YES | YES | YES | YES | NO | YES | YES |
More than two element types | - | YES | YES | YES | YES | NO | YES |
Efficient tag storage/obtention | - | YES | - | - | - | YES | NO |
Variant
Could we avoid the suboptimal storage of the “tag” used to determine the actual element type with the GADT approach? It turns out we can: we just need to use a number similarly to the way ADT values are internally stored.
This is how the Haskus.Utils.Variant
type is defined in my haskus-utils package.
Using a list of Variant
as our heterogeneous collection, we finally get:
Constraint | Tuple | ADT | Existential | Dynamic | HList | Either | GADT | Variant |
---|---|---|---|---|---|---|---|---|
Non-fixed length at runtime | NO | - | - | - | NO | - | - | - |
Easily data extensible | - | NO | YES | YES | YES | NO | YES | YES |
Easily support constraints on elements | - | NO | YES | NO | NO | NO | YES | YES |
Keep knowledge of element types | - | YES | NO | YES | YES | YES | YES | YES |
Easy to use | YES | YES | YES | YES | NO | YES | YES | YES |
More than two element types | - | YES | YES | YES | YES | NO | YES | YES |
Efficient tag storage/obtention | - | YES | - | - | - | YES | NO | YES |
Performance of Variant
operations should be close to performance of ADTs with one indirection (i.e., constructors with a single (lifted) field). In a previous post, we saw that some operations such as rewriteAll
in the example above were almost compiled to a direct pattern matching on the variant tag with GHC 8.0.1. My patch should have made this even better in GHC 8.2.1 but sadly it isn’t the case because of #14170… I hope to get it fixed soon.