This post is part of a series.
One of the good things in Haskell is that function signatures don’t lie in general. Nevertheless they do when we use unchecked exceptions. This is the prototype of the simplest of the exception-catching function
e come from? We can use this function as follows:
catch perform some kind of pattern matching depending on the dynamic type of a value (a.k.a. introspection)?
Indeed the Exception type-class has a default fromException method that uses the type-safe cast method that relies on the Typeable type-class. Since this commit, GHC generates a Typeable instance for every data type, hence any data type with a
Show instance can have an
From an implementation point of view,
catch invokes a GHC primitive (
catch#) that pushes a “catch frame” on the STG stack containing a reference to the handler function. When an exception is raised with one of the
raiseIO# primitives, the stack is unwinded down to the topmost “catch frame” (included). Then
fromException is used to check that the exception type is valid for the exception handler associated with the “catch frame”: if it is, it is executed, otherwise the exception is raised again. (See Handling Exceptions in Haskell (Alastair Reid) and Asynchronous Exceptions in Haskell (Simon Marlow et al.)).
It means that there is always an exception handler floating around when Haskell code is executed, the first handler being GHC’s “unhandled exception” handler, and that the correct handler is found in O(n) (where n is the number of nested
I think this mechanism is good to have to handle asynchronous exceptions/errors or to implement runtime system level features (e.g., interruptible computations). But I don’t think it is appropriate to use this mechanism to hide “normal”, non-exceptional, control flow paths:
There are many possible IO errors that I consider to be non-exceptional and we have to check the documentation of each IO function (if such a thing exists) to see which exceptions may be raised. See for instance the documentation for createDirectory.
Note that this is another example of abuse of predefined error sets where the set of POSIX errors is mapped with a surjective function into the smaller set of Haskell IO errors.
The compiler doesn’t help us if we don’t handle an exception while we should (e.g., the documentation is missing or wrong or if the function has been modified to return a new exception) or if we handle an exception that will never be triggered (dead code).
A typical approach to avoid exceptions (or for languages which lack exception mechanism) is to make functions return values indicating which control path to follow. It is the responsibility of the caller function to check this value and to call the appropriate handler code accordingly or to return an error value itself.
For instance, most Linux system calls return negative integral values when an error occurs and a positive value otherwise. We have potentially 2^32 error values to consider on 64-bit architectures and we have to read the documentation to know the potential valid error codes (e.g., see the Errors section in the manual page of mkdir).
Ad-hoc return sum types
In Haskell, we can use a sum type instead of a number and use pattern matching on the returned value, as in the following code:
The pattern match is compiled into a check of the constructor tag (a number) of the
r value to know what to do with it. The compiler tells us if we forget to handle a case.
The drawback of this approach is that it doesn’t compose well. Suppose we have two functions
doUV returning values in different “sets” except for the “Error A” and a function
doXYUV composing them.
We could use the following code and sum types to encode this:
data ReturnXY = ErrA | ErrB | ErrC | SuccessXY XY data ReturnUV = ErrA' | ErrD | ErrE | SuccessUV UV data ReturnXYUV = ErrA'' | ErrB' | ErrC' | ErrD' | ErrE' | SuccessXYUV UV doXY :: IO ReturnXY doXY = ... doUV :: XY -> IO ReturnUV doUV = ... doXYUV :: IO ReturnXYUV doXYUV = do mxy <- doXY case mxy of ErrA -> return ErrA'' ErrB -> return ErrB' ErrC -> return ErrC' SuccessXY xy -> do muv <- doUV xy case muv of ErrA' -> return ErrA'' ErrD -> return ErrD' ErrE -> return ErrE' SuccessUV uv -> return SuccessXYUV uv
As a return value cannot be in two different error sets (sum types), we have to alias constructor names, leading to a lot of boilerplate code.
Ad-hoc error sum types
A first step to reduce the boilerplate code is to separate error types from successful types with
Either: a single successful value type is assumed per function.
data ErrorXY = ErrA | ErrB | ErrC data ErrorUV = ErrA' | ErrD | ErrE data ErrorXYUV = ErrA'' | ErrB' | ErrC' | ErrD' | ErrE' doXY :: IO (Either ErrorXY XY) doXY = ... doUV :: XY -> IO (Either ErrorUV UV) doUV = ... doXYUV :: IO (Either ErrorXYUV UV) doXYUV = do mxy <- doXY case mxy of Left ErrA -> return (Left ErrA'') Left ErrB -> return (Left ErrB') Left ErrC -> return (Left ErrC') Right xy -> do muv <- doUV xy case muv of Left ErrA' -> return (Left ErrA'') Left ErrD -> return (Left ErrD') Left ErrE -> return (Left ErrE') Right uv -> return (Right uv)
In the special case where several functions have the same set of error values, we can easily compose them. For instance with EitherT:
However it is hard to define an
Errors sum type that is both small enough to avoid having to handle error cases that cannot happen and large enough to allow the composition of various functions throwing different kinds of errors.
Moreover fixing an
Errors sum type makes it hard to refine errors. For instance the following code may be what we would like to write but we can’t:
data IOErrors = FileNotFound | DiskError | ... readFile :: FilePath -> Either IOErrors Text ... readInterfaceFile path = do readFile path >>= \case Left FileNotFound -> return (Left InterfaceFileNotFound) e -> return e readSourceFile path = do readFile path >>= \case Left FileNotFound -> return (Left SourceFileNotFound) e -> return e
Teaser: the next posts of the series show how to write valid similar codes.
Parametric error sum type
A partial solution to avoid fixing the error sum type is to keep it parametric with type classes constraints and to instanciate it only when needed:
class TErrA e where throwErrA :: ErrA -> e class TErrB e where throwErrB :: ErrB -> e class TErrC e where throwErrC :: ErrC -> e class TErrD e where throwErrD :: ErrD -> e class TErrE e where throwErrE :: ErrE -> e doXY :: (TErrA e, TErrB e, TErrC e) => IO (Either e XY) doXY = ... -- use throwErrA, throwErrB and throwErrC doUV :: (TErrA e, TErrD e, TErrE e) => XY -> IO (Either e UV) doUV = ... -- use throwErrA, throwErrD and throwErrE -- the union of the constraints is automatically inferred by GHC doXYUV :: (TErrA e, TErrB e, TErrC e, TErrD e, TErrE) => IO (Either e UV) doXYUV = runEitherT $ do xy <- doXY doUV xy
TErr* constraints indicate the required field types for the error sum type. GHC automatically infers and merges these constraints.
A data type declaration with its
TErr* type class instances is only needed when we need to match on the error value:
data MyError = MyErrA ErrA | MyErrB ErrB | MyErrC ErrC | MyErrD ErrD | MyErrE ErrE instance TErrA MyError where throwErrA = MyErrA instance TErrB MyError where throwErrB = MyErrB instance TErrC MyError where throwErrC = MyErrC instance TErrD MyError where throwErrD = MyErrD instance TErrE MyError where throwErrE = MyErrE main :: IO () main = do muv <- doXYUV case muv of Right uv -> ... Left (MyErrA ErrA) -> ... Left (MyErrB ErrB) -> ... Left (MyErrC ErrC) -> ... ...
The inconvenient of this method is that it still requires a lot of boilerplate code for type classes and for ad-hoc sum types.
In the next part we see how to define an open sum type that allows us to deal with this issue without any boilerplate code.