Haskell type system from the ground up (1)
 Tags: GHC, Haskell October 4, 2016

I was thinking about how difficult it would be to explain GHC’s advanced type system features to non-Haskell programmers. I give it a go in this series of posts.

Avant donc que d’écrire, apprenez à penser. Selon que notre idée est plus ou moins obscure, L’expression la suit, ou moins nette, ou plus pure. Ce que l’on conçoit bien s’énonce clairement, Et les mots pour le dire arrivent aisément.

Hâtez-vous lentement, et sans perdre courage, Vingt fois sur le métier remettez votre ouvrage, Polissez-le sans cesse, et le repolissez, Ajoutez quelquefois, et souvent effacez.

Nicolas Boileau, “L’Art Poétique”, 1674

### Data types and functions

In Haskell we use functions to transform data:

``````data A = ... -- A, B and C are data
data B = ...
data C = ...

-- f is a function
f :: A -> B -> C
f = ...``````

Here we don’t care about what the function does and what the data represent. We just need to know that we can define functions that take some data as input and return some data.

### The problem

Suppose we define the addition functions `plusInt` and `plusFloat` for `Int` and `Float` data types respectively:

``````data Float = ...
data Int   = ...

plusInt   :: Int   -> Int   -> Int
plusInt = ...

plusFloat :: Float -> Float -> Float
plusFloat ...``````

We would like to use the same `+` operator for both operations instead of the longer distinct names `plusFloat` and `plusInt`. However we can’t have two functions with the same name. Intuitionistically we would like `+` to select the correct function depending on the type of its parameters.

### Solution: predicates and proofs

We would like a function such as:

``(+) :: a -> a -> a``

But we need a predicate to restrict `a` to data types that support addition. Currently we only have functions and data types: let’s use a parametric data type as a predicate and its instances as proofs.

In the following code, we use the data type `Plus a` as a predicate for the ability to use addition of data with type `a`. We create two proofs/instances of this predicate/data type: `plusForInt` and `plusForFloat` with the functions defined previously.

``````data Plus a = PlusOp
{ (+) :: a -> a -> a
}

plusForInt :: Plus Int
plusForInt = PlusOp { (+) = plusInt }

plusForFloat :: Plus Float
plusForFloat = PlusOp { (+) = plusFloat }

> :type (+)
(+) :: Plus a -> a -> a -> a
-- equivalent to
(+) :: Plus a -> (a -> a -> a)``````

We can see that if we give to the `+` function a proof `Plus a` that `a` types can be added, it returns a function to add numbers of this type. It is completely trivial because `+` only extracts the function in the `Plus a` data type and returns it.

``````> :type ((+) plusForInt)
((+) plusForInt) :: Int -> Int -> Int
> :type ((+) plusForFloat)
((+) plusForFloat) :: Float -> Float -> Float``````

At this point we would like the compiler to automatically pass the correct proof to the `+` function depending on the type of the data we pass to it. Let’s use the `=>` arrow to separate automatically passed proofs and normal parameters:

``(+) :: Plus a => a -> a -> a``

When the compiler sees such expression, it knows it has to implicitly pass an additional parameter that depends on the actual `a` type. If it can’t find it, it reports an error at compile time.

Given a few minor syntactic differences in actual Haskell code (cf next section), we can now write:

``````testInt :: Int
testInt = 8 + 3

testFloat :: Float
testFloat = 2.0 + 5.0``````

### Type-classes

If you have followed down to this point, you should have understood the basics of Haskell type-classes (predicates/data types) and type-class instances (proofs/data instances). Let’s check the actual type of `+` in Haskell:

``````> :type (+)
(+) :: Num a => a -> a -> a

> :info Num
class Num a where       -- equivalent to: data Num a = NumOp
(+) :: a -> a -> a    --                               { (+) :: a -> a -> a
(-) :: a -> a -> a    --                               , (-) :: a -> a -> a
(*) :: a -> a -> a    --                               , ...
negate :: a -> a
abs :: a -> a
signum :: a -> a
fromInteger :: Integer -> a``````

The predicate is called `Num` instead of our `Plus` as it is used as a predicate for more functions: `+`, `-`, `*`, `negate`, `abs`, `signum` and `fromInteger`.

The syntax `class Num a where` is just a fancy way of declaring a predicate/data type/type class `Num` with a single implicit `NumOp` constructor. Anonymous data instances/type class instances of this data type can be defined with:

``````instance Num Int where  -- equivalent to: numForInt :: Num Int
(+) = plusInt        --                numForInt = NumOp { (+) = plusInt
(-) = minusInt       --                                  , (-) = minusInt
...                  --                                  , ...``````

`Num` can be seen as a function from a type `a` to a data with type `Num a` if it exists, otherwise the compiler reports an error:

``````> "test" + "test"

<interactive>:34:1: error:
• No instance for (Num [Char]) arising from a use of ‘+’
...``````

Similarily, the compiler checks for the uniqueness of the proofs for a given type:

``````test.hs:22:10: error:
Duplicate instance declarations:
instance Plus Int -- Defined at test.hs:22:10
instance Plus Int -- Defined at test.hs:25:10``````

### Conclusion

We have seen that the type-class mecanism allows us to implicitly select a data instance (a value) from a type. We can make it even clearer with the following example. `getData` is really a function from a type to a String value.

``````{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE AllowAmbiguousTypes  #-}

class TypeToData a where
getData :: String

instance TypeToData Int where
getData = "An Int"

instance TypeToData Float where
getData = "A Float"

> getData @Int
"An Int"
> getData @Float
"A Float"
> getData @String

<interactive>:91:1: error:
• No instance for (TypeToData String) arising from a use of ‘getData’``````

We can check the type of `getData` with the GHC flag `-ddump-tc`:

``getData :: forall a. TypeToData a => String``

The type application `getData @Int` sets the type `a` to `Int` in the `getData` declaration. Hence it requires an implicit `TypeToData Int` proof from the compiler

### Annexes

#### Non-solution 1: keep using different names

If we only have data types and functions, we could say that `+` is for the integer addition and `+.` is for float addition:

``````(+)  = plusInt
(+.) = plusFloat``````

If it hasn’t changed since the last time I have used it 12 years ago, it is the solution chosen by the OCaml language. It becomes an issue if you add several other data types and functions accordingly (e.g., Double, Word, etc.). You have to find a different operator for each case.

#### Non-solution 2: use modules

If we have modules, we could put each function in its own module:

``````module Int where

(+) = plusInt

module Float where

(+) = plusFloat``````

But then we can’t use both functions in the same module without resolving the ambiguity (prefixing with the module name, renaming the function during the import, etc.). Some Haskell libraries use this approach (Text vs String, etc.).

#### Non-solution 3: sum types

Haskell doesn’t support sub-typing. The closer we can get given what we currently have (data types and functions) is by using a sum type:

``````data IntFloat = A Int
| B Float

(+) :: IntFloat -> IntFloat -> IntFloat``````

Here `IntFloat` can either contain an `Int` or a `Float`. Hence `+` supports both integer addition and float addition. Additionally it has to support mixed up integer/float addition (i.e., the first argument may contain an integer and the second one a float, or vice-versa). In this case the result may either be an integer or a float: it is chosen by convention and the compiler doesn’t check nor enforce it.

Programming languages with subtyping, operator overloading or automatic coercion use such kinds of mecanisms. For instance, C would automatically coerce the integer into a float; Scala uses operator/function overloading.