Heterogeneous collections
Tags: Haskell, Variant, Heterogeneous collections February 21, 2018

This post is about managing collections (e.g., lists) of heterogeneous data types in Haskell.

Summary: in my opinion, using the data type I’ve called Variant to handle heterogeneous collections is currently the best alternative. It is type-safe, efficient (both storage-wise and performance-wise) and easy to use.

The problem

Suppose we have several algebraic data types for different geometric shapes:

and that we know how to resize them by a factor:

We can write a function to resize a list of shapes:

Let’s test it on a collection of circles:

Now on a collection of squares:

And finally on a collection mixing circles and squares:

Argh! We forgot that lists can only contain elements having the same type! We would like an heterogeneous collection that can contain both squares and circles (and maybe more shapes)!

Heterogeneous Collections

We can use the Variant data type (or its alias V) and the pattern synonym V to wrap our values:

We can rewrite resizeAll as follows:

It works on any list of variants as long as all the possible types in the heterogeneous list (i.e., types in cs) have an associated Resizable instance.

Let’s test it:

We can check that type-checking works as expected by using this function on shapes2:

Indeed the type of shapes2 indicates that it may contain triangles but Triangle has no Resizable instance!

Another approach is to create an instance of Resizable for any Variant cs as long as every type in cs has a Resizable instance. It is defined recursively as follows:

We can now use the original resizeAll function:

We use this mechanism to provide Eq, Ord and Show instances. That’s how GHCI shows the variant contents.

We can easily compose functions by “composing” the constraints on the collection types:

We can map a function to modify only values having some given type:

We can also map to a Variant and fold the result:

As you can see, it is possible to have the same type more than once in a Variant. It is useful if you “select” types by index, e.g., fromVariantAt @2 v (cf. *At functions in Haddock).

You can also decide to keep a single entry per type with nubVariant:

You can also extend or reorder the type list of a Variant with liftVariant:

Obviously, we are not allowed to remove a type from the list (considered as a set):

Let’s try to extract some shapes from the collection depending on their type:

Indeed no Triangle can be in the shapes collection as it is defined to hold only circles and squares.

We may want to write a generic function that filters any collection, even if we statically know that it can’t contain the given type. Let’s use fromVariantMaybe function and MaybePopable constraint for this:

Instead of keeping values having a given type, it may be more useful to filter out some types:

Now let’s try some pattern-matching:

  • V pattern synonym and Popable constraint are used for types that must be storable in the variant.
  • VMaybe pattern synonym and MaybePopable constraint are used for types that may not be storable in the variant.

Sadly when we use pattern matching on a Variant, the compiler can’t detect if we match all the possible types in the Variant. Hence we need a failover wildcard match to avoid a “non-exhaustive pattern match” warning:

Maybe we could circumvent this with a compiler plugin (left as an exercise for the motivated reader).

Conclusion

I hope this introduction to the use of the Haskus.Utils.Variant type has been interesting. You can find it in the haskus-utils package either on GitHub or on Hackage. Feedback is welcome!

There was already a brief description of the use of the Variant data type to handle heterogeneous collections on this blog buried at the end of this previous post about control-flow. Nevertheless, after reading this recent post about using existentials, Typeable, Dynamic and the like to manage heterogeneous collections in Haskell, I figured it deserved its own post. It led to a major API update and to a new release of haskus-utils package: I should write posts more often.

The first drafts of this post started with a presentation of the usual approaches to heterogeneous collections (i.e., the state-of-the-art). It was a bit boring and many readers would probably have left without reaching the Variant part, hence the state-of-the-art is now presented in the annexe of this post.


Annexe: alternatives and comparison

The wiki lists several solutions to the “heterogeneous collections” problem:

  • Tuples
  • Algebraic datatypes
  • Data.Dynamic
  • Existential types
  • HLists

To which we should add:

  • recursive GADT
  • Variant

Let’s compare them!

Tuples

We don’t want a statically fixed length collection, hence we exclude tuples.

Constraint Tuple
Non-fixed length NO

Algebraic datatypes

The idea is to create a wrapper Shape datatype:

We can write an instance:

This is not easily extensible: if we want to support a new shape (e.g., a triangle), we have to modify both the Shape datatype and its Resizable instance:

Constraint Tuple ADT
Non-fixed length NO -
Easily data extensible - NO

Now suppose that for some reason we want to statically ensure that a collection only contains shapes with a central symmetry (e.g., squares and circles but not triangles). The only way is to have different shape types for the different constraints on the shapes:

But now we can’t resize elements of this collection anymore, so we need:

Now adding a new shape is even more cumbersome because we potentially have to modify several datatypes and their instances.

Constraint Tuple ADT
Non-fixed length NO -
Easily data extensible - NO
Easily support constraints on elements - NO

Existential types

We can use existential types to embed some type-classes alongside some values. For instance we could rewrite our previous datatype as follows:

Constraint Tuple ADT Existentials
Non-fixed length NO - -
Easily data extensible - NO YES
Easily support constraints on elements - NO YES

The problem is that we have lost the knowledge of the actual types in the collection: we only know that they are Resizable. For instance we can’t easily write a function extracting the circles:

In addition we can’t easily reuse collections. Suppose we also have a type-class Drawable:

We could write a drawAll function by using the same technique:

Now if we want to resize and then to draw a collection of shapes:

We need to introduce another wrapper datatype with its boilerplate:

Hence we want to keep the knowledge of the types of the elements in the collection at runtime instead of using a different wrapper type per collection.

Constraint Tuple ADT Existentials
Non-fixed length NO - -
Easily data extensible - NO YES
Easily support constraints on elements - NO YES
Keep knowledge of element types at runtime - YES NO

Data.Dynamic

Data.Dynamic can be used to wrap any value. It allows us to rewrite resizeAll as follows:

The issue is that any datatype can be wrapped into a Dynamic, hence we lose some type safety. We want to restrict the types of the collection members using some constraints (Resizable, etc.).

Constraint Tuple ADT Existentials Dynamic
Non-fixed length NO - - -
Easily data extensible - NO YES YES
Easily support constraints on elements - NO YES NO
Keep knowledge of element types at runtime - YES NO YES

HList

Instead of wrapping the elements of the collection, we could change the collection type to one that keeps track of the element types at compile time. For instance, HList. However the size of the “list” is fixed at compile time.

Constraint Tuple ADT Existentials Dynamic HList
Non-fixed length at runtime NO - - - NO
Easily data extensible - NO YES YES YES
Easily support constraints on elements - NO YES NO NO
Keep knowledge of element types - YES NO YES YES
Easy to use YES YES YES YES NO

Recursive GADT

The most basic sum type in Haskell is Either a b. It can be used to wrap a value whose type is either a or b.

Constraint Tuple ADT Existential Dynamic HList Either
Non-fixed length at runtime NO - - - NO -
Easily data extensible - NO YES YES YES NO
Easily support constraints on elements - NO YES NO NO NO
Keep knowledge of element types - YES NO YES YES YES
Easy to use YES YES YES YES NO YES
More than two element types - YES YES YES YES NO

We can generalize it with a GADT such as:

To make any union of resizable types resizable, we can write the following instances:

The issue with this approach is that the actual type of an element is obtained by a linear traversal of the Either constructor nest. Moreover, we need to store this constructor nest for each element.

Constraint Tuple ADT Existential Dynamic HList Either GADT
Non-fixed length at runtime NO - - - NO - -
Easily data extensible - NO YES YES YES NO YES
Easily support constraints on elements - NO YES NO NO NO YES
Keep knowledge of element types - YES NO YES YES YES YES
Easy to use YES YES YES YES NO YES YES
More than two element types - YES YES YES YES NO YES
Efficient tag storage/obtention - YES - - - YES NO

Variant

Could we avoid the suboptimal storage of the “tag” used to determine the actual element type with the GADT approach? It turns out we can: we just need to use a number similarly to the way ADT values are internally stored.

This is how the Haskus.Utils.Variant type is defined in my haskus-utils package.

Using a list of Variant as our heterogeneous collection, we finally get:

Constraint Tuple ADT Existential Dynamic HList Either GADT Variant
Non-fixed length at runtime NO - - - NO - - -
Easily data extensible - NO YES YES YES NO YES YES
Easily support constraints on elements - NO YES NO NO NO YES YES
Keep knowledge of element types - YES NO YES YES YES YES YES
Easy to use YES YES YES YES NO YES YES YES
More than two element types - YES YES YES YES NO YES YES
Efficient tag storage/obtention - YES - - - YES NO YES

Performance of Variant operations should be close to performance of ADTs with one indirection (i.e., constructors with a single (lifted) field). In a previous post, we saw that some operations such as rewriteAll in the example above were almost compiled to a direct pattern matching on the variant tag with GHC 8.0.1. My patch should have made this even better in GHC 8.2.1 but sadly it isn’t the case because of #14170… I hope to get it fixed soon.