Tags: GHC, Haskell | October 4, 2016 |
I was thinking about how difficult it would be to explain GHC’s advanced type system features to non-Haskell programmers. I give it a go in this series of posts.
Avant donc que d’écrire, apprenez à penser. Selon que notre idée est plus ou moins obscure, L’expression la suit, ou moins nette, ou plus pure. Ce que l’on conçoit bien s’énonce clairement, Et les mots pour le dire arrivent aisément.
Hâtez-vous lentement, et sans perdre courage, Vingt fois sur le métier remettez votre ouvrage, Polissez-le sans cesse, et le repolissez, Ajoutez quelquefois, et souvent effacez.
– Nicolas Boileau, “L’Art Poétique”, 1674
Data types and functions
In Haskell we use functions to transform data:
data A = ... -- A, B and C are data
data B = ...
data C = ...
-- f is a function
f :: A -> B -> C
f = ...
Here we don’t care about what the function does and what the data represent. We just need to know that we can define functions that take some data as input and return some data.
The problem
Suppose we define the addition functions plusInt
and plusFloat
for Int
and Float
data types respectively:
data Float = ...
data Int = ...
plusInt :: Int -> Int -> Int
plusInt = ...
plusFloat :: Float -> Float -> Float
plusFloat ...
We would like to use the same +
operator for both operations instead of the longer distinct names plusFloat
and plusInt
. However we can’t have two functions with the same name. Intuitionistically we would like +
to select the correct function depending on the type of its parameters.
Solution: predicates and proofs
We would like a function such as:
But we need a predicate to restrict a
to data types that support addition. Currently we only have functions and data types: let’s use a parametric data type as a predicate and its instances as proofs.
In the following code, we use the data type Plus a
as a predicate for the ability to use addition of data with type a
. We create two proofs/instances of this predicate/data type: plusForInt
and plusForFloat
with the functions defined previously.
data Plus a = PlusOp
{ (+) :: a -> a -> a
}
plusForInt :: Plus Int
plusForInt = PlusOp { (+) = plusInt }
plusForFloat :: Plus Float
plusForFloat = PlusOp { (+) = plusFloat }
> :type (+)
(+) :: Plus a -> a -> a -> a
-- equivalent to
(+) :: Plus a -> (a -> a -> a)
We can see that if we give to the +
function a proof Plus a
that a
types can be added, it returns a function to add numbers of this type. It is completely trivial because +
only extracts the function in the Plus a
data type and returns it.
> :type ((+) plusForInt)
((+) plusForInt) :: Int -> Int -> Int
> :type ((+) plusForFloat)
((+) plusForFloat) :: Float -> Float -> Float
At this point we would like the compiler to automatically pass the correct proof to the +
function depending on the type of the data we pass to it. Let’s use the =>
arrow to separate automatically passed proofs and normal parameters:
When the compiler sees such expression, it knows it has to implicitly pass an additional parameter that depends on the actual a
type. If it can’t find it, it reports an error at compile time.
Given a few minor syntactic differences in actual Haskell code (cf next section), we can now write:
Type-classes
If you have followed down to this point, you should have understood the basics of Haskell type-classes (predicates/data types) and type-class instances (proofs/data instances). Let’s check the actual type of +
in Haskell:
> :type (+)
(+) :: Num a => a -> a -> a
> :info Num
class Num a where -- equivalent to: data Num a = NumOp
(+) :: a -> a -> a -- { (+) :: a -> a -> a
(-) :: a -> a -> a -- , (-) :: a -> a -> a
(*) :: a -> a -> a -- , ...
negate :: a -> a
abs :: a -> a
signum :: a -> a
fromInteger :: Integer -> a
The predicate is called Num
instead of our Plus
as it is used as a predicate for more functions: +
, -
, *
, negate
, abs
, signum
and fromInteger
.
The syntax class Num a where
is just a fancy way of declaring a predicate/data type/type class Num
with a single implicit NumOp
constructor. Anonymous data instances/type class instances of this data type can be defined with:
instance Num Int where -- equivalent to: numForInt :: Num Int
(+) = plusInt -- numForInt = NumOp { (+) = plusInt
(-) = minusInt -- , (-) = minusInt
... -- , ...
Num
can be seen as a function from a type a
to a data with type Num a
if it exists, otherwise the compiler reports an error:
> "test" + "test"
<interactive>:34:1: error:
• No instance for (Num [Char]) arising from a use of ‘+’
...
Similarily, the compiler checks for the uniqueness of the proofs for a given type:
test.hs:22:10: error:
Duplicate instance declarations:
instance Plus Int -- Defined at test.hs:22:10
instance Plus Int -- Defined at test.hs:25:10
Conclusion
We have seen that the type-class mecanism allows us to implicitly select a data instance (a value) from a type. We can make it even clearer with the following example. getData
is really a function from a type to a String value.
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE AllowAmbiguousTypes #-}
class TypeToData a where
getData :: String
instance TypeToData Int where
getData = "An Int"
instance TypeToData Float where
getData = "A Float"
> getData @Int
"An Int"
> getData @Float
"A Float"
> getData @String
<interactive>:91:1: error:
• No instance for (TypeToData String) arising from a use of ‘getData’
We can check the type of getData
with the GHC flag -ddump-tc
:
The type application getData @Int
sets the type a
to Int
in the getData
declaration. Hence it requires an implicit TypeToData Int
proof from the compiler
Annexes
Non-solution 1: keep using different names
If we only have data types and functions, we could say that +
is for the integer addition and +.
is for float addition:
If it hasn’t changed since the last time I have used it 12 years ago, it is the solution chosen by the OCaml language. It becomes an issue if you add several other data types and functions accordingly (e.g., Double, Word, etc.). You have to find a different operator for each case.
Non-solution 2: use modules
If we have modules, we could put each function in its own module:
But then we can’t use both functions in the same module without resolving the ambiguity (prefixing with the module name, renaming the function during the import, etc.). Some Haskell libraries use this approach (Text vs String, etc.).
Non-solution 3: sum types
Haskell doesn’t support sub-typing. The closer we can get given what we currently have (data types and functions) is by using a sum type:
Here IntFloat
can either contain an Int
or a Float
. Hence +
supports both integer addition and float addition. Additionally it has to support mixed up integer/float addition (i.e., the first argument may contain an integer and the second one a float, or vice-versa). In this case the result may either be an integer or a float: it is chosen by convention and the compiler doesn’t check nor enforce it.
Programming languages with subtyping, operator overloading or automatic coercion use such kinds of mecanisms. For instance, C would automatically coerce the integer into a float; Scala uses operator/function overloading.