Haskell type system from the ground up (1)

October 4, 2016

I was thinking about how difficult it would be to explain GHC’s advanced type system features to non-Haskell programmers. I give it a go in this series of posts.

Avant donc que d’écrire, apprenez à penser. Selon que notre idée est plus ou moins obscure, L’expression la suit, ou moins nette, ou plus pure. Ce que l’on conçoit bien s’énonce clairement, Et les mots pour le dire arrivent aisément.

Hâtez-vous lentement, et sans perdre courage, Vingt fois sur le métier remettez votre ouvrage, Polissez-le sans cesse, et le repolissez, Ajoutez quelquefois, et souvent effacez.

– Nicolas Boileau, “L’Art Poétique”, 1674

Data types and functions

In Haskell we use functions to transform data:

data A = ... -- A, B and C are data
data B = ...
data C = ...

-- f is a function
f :: A -> B -> C
f = ...

Here we don’t care about what the function does and what the data represent. We just need to know that we can define functions that take some data as input and return some data.

The problem

Suppose we define the addition functions plusInt and plusFloat for Int and Float data types respectively:

data Float = ...
data Int   = ...

plusInt   :: Int   -> Int   -> Int
plusInt = ...

plusFloat :: Float -> Float -> Float
plusFloat ...

We would like to use the same + operator for both operations instead of the longer distinct names plusFloat and plusInt. However we can’t have two functions with the same name. Intuitionistically we would like + to select the correct function depending on the type of its parameters.

Solution: predicates and proofs

We would like a function such as:

(+) :: a -> a -> a

But we need a predicate to restrict a to data types that support addition. Currently we only have functions and data types: let’s use a parametric data type as a predicate and its instances as proofs.

In the following code, we use the data type Plus a as a predicate for the ability to use addition of data with type a. We create two proofs/instances of this predicate/data type: plusForInt and plusForFloat with the functions defined previously.

data Plus a = PlusOp
   { (+) :: a -> a -> a
   }

plusForInt :: Plus Int
plusForInt = PlusOp { (+) = plusInt }

plusForFloat :: Plus Float
plusForFloat = PlusOp { (+) = plusFloat }

> :type (+)
(+) :: Plus a -> a -> a -> a
-- equivalent to
(+) :: Plus a -> (a -> a -> a)

We can see that if we give to the + function a proof Plus a that a types can be added, it returns a function to add numbers of this type. It is completely trivial because + only extracts the function in the Plus a data type and returns it.

> :type ((+) plusForInt)
((+) plusForInt) :: Int -> Int -> Int
> :type ((+) plusForFloat)
((+) plusForFloat) :: Float -> Float -> Float

At this point we would like the compiler to automatically pass the correct proof to the + function depending on the type of the data we pass to it. Let’s use the => arrow to separate automatically passed proofs and normal parameters:

(+) :: Plus a => a -> a -> a

When the compiler sees such expression, it knows it has to implicitly pass an additional parameter that depends on the actual a type. If it can’t find it, it reports an error at compile time.

Given a few minor syntactic differences in actual Haskell code (cf next section), we can now write:

testInt :: Int
testInt = 8 + 3

testFloat :: Float
testFloat = 2.0 + 5.0

Type-classes

If you have followed down to this point, you should have understood the basics of Haskell type-classes (predicates/data types) and type-class instances (proofs/data instances). Let’s check the actual type of + in Haskell:

> :type (+)
(+) :: Num a => a -> a -> a

> :info Num
class Num a where       -- equivalent to: data Num a = NumOp
  (+) :: a -> a -> a    --                               { (+) :: a -> a -> a
  (-) :: a -> a -> a    --                               , (-) :: a -> a -> a
  (*) :: a -> a -> a    --                               , ...
  negate :: a -> a
  abs :: a -> a
  signum :: a -> a
  fromInteger :: Integer -> a

The predicate is called Num instead of our Plus as it is used as a predicate for more functions: +, -, *, negate, abs, signum and fromInteger.

The syntax class Num a where is just a fancy way of declaring a predicate/data type/type class Num with a single implicit NumOp constructor. Anonymous data instances/type class instances of this data type can be defined with:

instance Num Int where  -- equivalent to: numForInt :: Num Int
   (+) = plusInt        --                numForInt = NumOp { (+) = plusInt
   (-) = minusInt       --                                  , (-) = minusInt
   ...                  --                                  , ...

Num can be seen as a function from a type a to a data with type Num a if it exists, otherwise the compiler reports an error:

> "test" + "test"

<interactive>:34:1: error:
    • No instance for (Num [Char]) arising from a use of ‘+’
    ...

Similarily, the compiler checks for the uniqueness of the proofs for a given type:

test.hs:22:10: error:
    Duplicate instance declarations:
         instance Plus Int -- Defined at test.hs:22:10
         instance Plus Int -- Defined at test.hs:25:10

Conclusion

We have seen that the type-class mecanism allows us to implicitly select a data instance (a value) from a type. We can make it even clearer with the following example. getData is really a function from a type to a String value.

{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE AllowAmbiguousTypes  #-}

class TypeToData a where
   getData :: String

instance TypeToData Int where
   getData = "An Int"

instance TypeToData Float where
   getData = "A Float"

> getData @Int    
"An Int"          
> getData @Float  
"A Float"
> getData @String

<interactive>:91:1: error:
    • No instance for (TypeToData String) arising from a use of ‘getData’

We can check the type of getData with the GHC flag -ddump-tc:

getData :: forall a. TypeToData a => String

The type application getData @Int sets the type a to Int in the getData declaration. Hence it requires an implicit TypeToData Int proof from the compiler

Annexes

Non-solution 1: keep using different names

If we only have data types and functions, we could say that + is for the integer addition and +. is for float addition:

(+)  = plusInt
(+.) = plusFloat

If it hasn’t changed since the last time I have used it 12 years ago, it is the solution chosen by the OCaml language. It becomes an issue if you add several other data types and functions accordingly (e.g., Double, Word, etc.). You have to find a different operator for each case.

Non-solution 2: use modules

If we have modules, we could put each function in its own module:

module Int where

(+) = plusInt


module Float where

(+) = plusFloat

But then we can’t use both functions in the same module without resolving the ambiguity (prefixing with the module name, renaming the function during the import, etc.). Some Haskell libraries use this approach (Text vs String, etc.).

Non-solution 3: sum types

Haskell doesn’t support sub-typing. The closer we can get given what we currently have (data types and functions) is by using a sum type:

data IntFloat = A Int 
              | B Float

(+) :: IntFloat -> IntFloat -> IntFloat

Here IntFloat can either contain an Int or a Float. Hence + supports both integer addition and float addition. Additionally it has to support mixed up integer/float addition (i.e., the first argument may contain an integer and the second one a float, or vice-versa). In this case the result may either be an integer or a float: it is chosen by convention and the compiler doesn’t check nor enforce it.

Programming languages with subtyping, operator overloading or automatic coercion use such kinds of mecanisms. For instance, C would automatically coerce the integer into a float; Scala uses operator/function overloading.