First Experiments With Dependent Types In GHC

Tags: Haskell, GHC, Dependent Types, FFI, haskus-system

March 18, 2016

Update 2016-12-28: rename ViperVM into Haskus system and fix links accordingly

For two weeks I have been getting my feet wet with GHC’s type extensions: DataKinds, TypeFamilies, TypeOperators, etc.

I rely a lot on the Foreign Function Interface (FFI) in my Haskus system project (formerly ViperVM) and I wanted to easily and efficiently map Haskell data types to C data types (unions, arrays, bitsets, bitfields, etc.). Last week I have put together a coherent set of modules to do it, heavily based on GHC’s type extensions. It is not yet documented on this blog, but you can read about it here: Writing bindings

Playing with type extensions gave me a lot of food for thought, hence this article about using them to enhance the way we do error checking and other control-flow stuff.

Control-flow

I am now trying to solve another issue: control-flow in the IO monad with error handling. For instance, if you have read this page, you know that most Linux’s syscalls can either fail and return an error, or succeed and return a value. E.g.:

pipe :: IO (Either Error (FileDescriptor, FileDescriptor))

I will use the following simpler functions to illustrate this article:

f :: IO (Either Error A)
g :: A -> IO (Either Error B)
h :: B -> IO (Either Error C)

EitherT

To compose these calls, it is common to use the EitherT monad transformer.

import Control.Monad.Trans.Either

result :: IO (Either Error C)
result = runEitherT $ do
   a <- EitherT f
   b <- EitherT (g a)
   c <- EitherT (h b)
   return c

-- or
result :: IO (Either Error C)
result = runEitherT (EitherT f >>= (EitherT . g) >>= (EitherT . h))

The corresponding data-flow diagram is then:

Drawbacks

I think that this composition approach, however, is quite limiting:

all the functions must share the same Error type
when we get an Error, we don’t know which of the function has failed

Suppose we have instead a different error type per function:

f :: IO (Either ErrorF A)
g :: A -> IO (Either ErrorG B)
h :: B -> IO (Either ErrorH C)

We can somehow circumvent the issues mentionned above by enhancing the Error data type so that it can contain the different kinds of error. We could also add an identifier indicating the function which has triggered the error (I haven’t done it here to keep things simple).

data Error
   = ErrF ErrorF
   | ErrG ErrorG
   | ErrH ErrorH
   | ....

This method doesn’t scale well. When we use Error in different places, we never know the real subset of errors that can happen in practice. For instance, in the following example, the type of result2 doesn’t tell us that the ErrorH error will never happen while it may happen in result.

result :: IO (Either Error C)
result = runEitherT $ do
   a <- EitherT (mapLeft ErrF <$> f)
   b <- EitherT (mapLeft ErrG <$> g a)
   c <- EitherT (mapLeft ErrH <$> h b)
   return c

result2 :: IO (Either Error B)
result2 = runEitherT $ do
   a <- EitherT (mapLeft ErrF <$> f)
   b <- EitherT (mapLeft ErrG <$> g a)
   return b

(It is similar to errno in C: it can be any int value so we never know if we have tested all the possible error cases for a given syscall).

Handling errors is already very boring, we don’t want to handle errors that will never happen and we don’t want to create new ErrorXXX data types for each error subset due to composition. Instead, we would like to compose the functions to get:

That is, result should have an extended Either type that can contain an arbitrary number of different types: here C, ErrorF, ErrorG and ErrorH. And we want the help of the compiler to ensure that we don’t forget to handle some errors: we want to pattern match on result so that the compiler signals missing cases.

Variant

I have implemented a Variant type that supports an arbitrary number of content types. Similarly to data types (as implemented in GHC as far as I know), a hidden tag number identifies the actual type of the contained value. The value can be obtained/set with a type index (a Nat) that is used both to index into the content types and to check/set the tag value. It is much more simple to understand with a few examples:

{-# LANGUAGE DataKinds #-}

-- creating a new variant: using an Int (index 1)
v :: Variant '[String,Int,Float]
v = setVariant (Proxy :: Proxy 1) 10
-- or simply
v = setVariant1 10

-- trying to get a String from the variant (index 0)
> getVariant0 v
Nothing

-- trying to get an Int from the variant (index 1)
> getVariant1 v
Just 10

-- trying to get a Float from the variant (index 2)
> getVariant2 v
Nothing

-- trying with an out-of-bound index
> getVariant3 v -- doesn't compile!

-- Creating a variant from a String
w :: Variant '[String,Int,Float]
w = setVariant0 "Hello"

-- Updating a variant value (if it has the correct tag of course)
w' :: Variant '[String,Int,Float]
w' = updateVariant0 (++ " World!") w

-- Updating a value and changing the type
v' :: Variant '[String,String,Float]
v' = updateVariant1 (\x -> "Value: " ++ show x) v

-- Converting a Variant into a Tuple
> matchVariant w'
(Just "Hello World!",Nothing,Nothing)

> matchVariant v'
(Nothing,Just "Value: 10",Nothing)

-- We can also store functions, etc.
z :: Variant '[String,Int,Float,Int->Int]
z = setVariant3 (+1)

> fmap ($ 2) (getVariant3 z)
Just 3

Flow

I have made a little control-flow module that deals with functions returning Variants. Now instead of returning Either Error a, we can make our functions return a Variant '[a,Error] and we can compose them with flowSeq as in the following example:

{-# LANGUAGE DataKinds #-}

import Haskus.Utils.Variant
import Haskus.Utils.Flow

data ErrorF = ErrorF deriving (Show)
data ErrorG = ErrorG deriving (Show)
data ErrorH = ErrorH deriving (Show)

f :: IO (Variant '[Int,ErrorF])
f = return (setVariant0 10)

g :: Int -> IO (Variant '[Int,ErrorG])
g x = do
   print x
   --fail!
   return (setVariant1 ErrorG)

h :: Int -> IO (Variant '[Int,ErrorH])
h x = do
   print x
   return (setVariant0 (x+20))

result =
   f
   `flowSeq` g
   `flowSeq` h

> :t result
result :: IO (Variant '[Int, ErrorH, ErrorG, ErrorF])

> matchVariant <$> result
10
(Nothing,Nothing,Just ErrorG,Nothing)

Notice that by convention, the first value type in the variant is the type of the “correct” value: it is the type of the value that is potentially passed to the next function.

We can easily lift functions returning Either with flowSeqE:

data ErrorE = ErrorE deriving (Show)

e :: Int -> IO (Either ErrorE Int)
e x = return (Right (x*100))

result2 = f `flowSeqE` e

> :t result2
result2 :: IO (Variant '[Int, ErrorE, ErrorF])

> matchVariant <$> result2
(Just 1000,Nothing,Nothing)

Similarly, we can lift functions that do not fail with flowSeqM:

square :: Int -> IO Int
square x = return (x*x)

result3 = f `flowSeqM` square `flowSeq` h

> :t result3
result3 :: IO (Variant '[Int, ErrorH, ErrorF])

> matchVariant <$> result3
100
(Just 120,Nothing,Nothing)

Note that a function may return several kinds of errors:

err :: Int -> IO (Variant '[Int,ErrorF,ErrorG])
err x
   | x > 10    = return (setVariant1 ErrorF)
   | x < 10    = return (setVariant2 ErrorG)
   | otherwise = return (setVariant0 123)

result4 = f `flowSeq` err `flowSeq` h

> :t result4
result4 :: IO (Variant '[Int, ErrorH, ErrorF, ErrorG, ErrorF])

> matchVariant <$> result4
123
(Just 143,Nothing,Nothing,Nothing,Nothing)

Most flow functions are biased in favor of the first element of the variant (the “correct” one). Nevertheless it is totally possible to perform custom operations during a flow:

import Control.Monad ((>=>))

-- | We suppose that if "g" fails, we can use the default value 18
fixG :: Variant '[Int,ErrorG] -> IO (Variant '[Int])
fixG v = return $ case matchVariant v of
   (Just r,_) -> setVariant0 r
   _          -> setVariant0 18

result5 =
   f
   `flowSeq` (g >=> fixG)
   `flowSeq` h

> :t result5
result5 :: IO (Variant '[Int, ErrorH, ErrorF])  -- no ErrorG!

> matchVariant <$> result5
10
18
(Just 38,Nothing,Nothing)

We can also directly match in the flow with flowMatch:

{-# LANGUAGE LambdaCase #-}

result6 =
   f
   `flowSeq` g
   `flowSeq` h
   `flowMatch` \case
      (Just r,_,_,_) -> putStrLn ("Success! "++ show r)
      (_,Just r,_,_) -> putStrLn ("Error in h: "++ show r)
      (_,_,Just r,_) -> putStrLn ("Error in g: "++ show r)
      (_,_,_,Just r) -> putStrLn ("Error in f: "++ show r)

> :t result6
result6 :: IO ()

> result6
10
Error in g: ErrorG

Finally, if we want to catch all the errors with a given type, we can use flowCatch:

ff = 
   f
   `flowSeq` g
   `flowSeq` h
   `flowSeq` g

> :t ff
ff :: IO (Variant '[Int, ErrorG, ErrorH, ErrorG, ErrorF])

result7 =
   ff
   `flowCatch` \case
      (Left ErrorG) -> do
         putStrLn "Handling error in g"
         return (setVariant0 18)
      (Right a)     -> return a

> :t result7
result7 :: IO (Variant '[Int, ErrorH, ErrorF])  -- not any ErrorG anymore!

> result7 `flowSeqM` print
10
Handling error in g
18

If we catch all the error types, we must be able to match the result with singleVariant, otherwise it doesn’t compile:

> singleVariant <$> result7 -- doesn't compile

<interactive>:275:19:
    Couldn't match type ‘'[ErrorH, ErrorF]’ with ‘'[]’   -- we know the error to catch
    Expected type: IO (Variant '[Int])
      Actual type: IO (Variant '[Int, ErrorH, ErrorF])
    In the second argument of ‘(<$>)’, namely ‘result7’
    In the expression: singleVariant <$> result7


result8 =
   ff
   `flowCatch` \case
      (Left ErrorG) -> do
         putStrLn "Handling error in g"
         return (setVariant0 18)
      (Right a)     -> return a
   `flowCatch` \case
      (Left ErrorF) -> do
         putStrLn "Handling error in f"
         return (setVariant0 10)
      (Right a)     -> return a
   `flowCatch` \case
      (Left ErrorH) -> do
         putStrLn "Handling error in h"
         return (setVariant0 42)
      (Right a)     -> return a

> :t singleVariant <$> result8
singleVariant <$> result8 :: IO Int

> singleVariant <$> result8
10
Handling error in g
18

Conclusion

Finally, we now have a control-flow framework that seems easy to work with (I still need to use it in production). It looks a little bit like checked-exceptions in Java, but first-class (we can manipulate them, etc.). I should add combinators usually found in Exception frameworks (finally, bracket, etc.).

If you have other ideas, do not hesitate to contact me!

All of this code is part of a larger project now called Haskus system