Control Flow in Haskell (1) - State of the art
Tags: Haskell, Control Flow December 12, 2016

This post is part of a series.

Unchecked Exceptions

One of the good things in Haskell is that function signatures don’t lie in general. Nevertheless they do when we use unchecked exceptions. This is the prototype of the simplest of the exception-catching function catch:

Where does e come from? We can use this function as follows:

Does catch perform some kind of pattern matching depending on the dynamic type of a value (a.k.a. introspection)?

Indeed the Exception type-class has a default fromException method that uses the type-safe cast method that relies on the Typeable type-class. Since this commit, GHC generates a Typeable instance for every data type, hence any data type with a Show instance can have an Exception instance.

From an implementation point of view, catch invokes a GHC primitive (catch#) that pushes a “catch frame” on the STG stack containing a reference to the handler function. When an exception is raised with one of the raise# or raiseIO# primitives, the stack is unwinded down to the topmost “catch frame” (included). Then fromException is used to check that the exception type is valid for the exception handler associated with the “catch frame”: if it is, it is executed, otherwise the exception is raised again. (See Handling Exceptions in Haskell (Alastair Reid) and Asynchronous Exceptions in Haskell (Simon Marlow et al.)).

It means that there is always an exception handler floating around when Haskell code is executed, the first handler being GHC’s “unhandled exception” handler, and that the correct handler is found in O(n) (where n is the number of nested catch).

I think this mechanism is good to have to handle asynchronous exceptions/errors or to implement runtime system level features (e.g., interruptible computations). But I don’t think it is appropriate to use this mechanism to hide “normal”, non-exceptional, control flow paths:

  • There are many possible IO errors that I consider to be non-exceptional and we have to check the documentation of each IO function (if such a thing exists) to see which exceptions may be raised. See for instance the documentation for createDirectory.

    Note that this is another example of abuse of predefined error sets where the set of POSIX errors is mapped with a surjective function into the smaller set of Haskell IO errors.

  • The compiler doesn’t help us if we don’t handle an exception while we should (e.g., the documentation is missing or wrong or if the function has been modified to return a new exception) or if we handle an exception that will never be triggered (dead code).

Sum types

A typical approach to avoid exceptions (or for languages which lack exception mechanism) is to make functions return values indicating which control path to follow. It is the responsibility of the caller function to check this value and to call the appropriate handler code accordingly or to return an error value itself.

For instance, most Linux system calls return negative integral values when an error occurs and a positive value otherwise. We have potentially 2^32 error values to consider on 64-bit architectures and we have to read the documentation to know the potential valid error codes (e.g., see the Errors section in the manual page of mkdir).

Ad-hoc return sum types

In Haskell, we can use a sum type instead of a number and use pattern matching on the returned value, as in the following code:

The pattern match is compiled into a check of the constructor tag (a number) of the r value to know what to do with it. The compiler tells us if we forget to handle a case.

The drawback of this approach is that it doesn’t compose well. Suppose we have two functions doXY and doUV returning values in different “sets” except for the “Error A” and a function doXYUV composing them.

We could use the following code and sum types to encode this:

As a return value cannot be in two different error sets (sum types), we have to alias constructor names, leading to a lot of boilerplate code.

Ad-hoc error sum types

A first step to reduce the boilerplate code is to separate error types from successful types with Either: a single successful value type is assumed per function.

In the special case where several functions have the same set of error values, we can easily compose them. For instance with EitherT:

However it is hard to define an Errors sum type that is both small enough to avoid having to handle error cases that cannot happen and large enough to allow the composition of various functions throwing different kinds of errors.

Moreover fixing an Errors sum type makes it hard to refine errors. For instance the following code may be what we would like to write but we can’t:

Teaser: the next posts of the series show how to write valid similar codes.

Parametric error sum type

A partial solution to avoid fixing the error sum type is to keep it parametric with type classes constraints and to instanciate it only when needed:

The TErr* constraints indicate the required field types for the error sum type. GHC automatically infers and merges these constraints.

A data type declaration with its TErr* type class instances is only needed when we need to match on the error value:

The inconvenient of this method is that it still requires a lot of boilerplate code for type classes and for ad-hoc sum types.

In the next part we see how to define an open sum type that allows us to deal with this issue without any boilerplate code.


Comments