On the inlining of Integer and Natural operations
 Tags: Haskell, Bignum April 28, 2022

In this post I discuss the inlining of Integer and Natural operations in Haskell. It’s a promising performance work I’ve been conducting six months ago, which was blocked by an independent issue, but that I will likely resume soon as the issue has been fixed in the meantime.

Note: this post has been first published on IOHK Engineering blog.

To follow this post, you must know that `Natural` numbers are represented as follows in `ghc-bignum`:

``````-- | Natural number
--
-- Invariant: numbers <= WORD_MAXBOUND use the `NS` constructor
data Natural
= NS !Word#
| NB !BigNat#``````

Small naturals are represented with a `Word#` and large ones with a `BigNat#` (a `ByteArray#`).

Now consider the following simple example using Natural:

``````-- | Add 2 to a Word. Use Natural to avoid Word overflow
foo :: Word -> Natural
foo x = fromIntegral x + 2``````

There are only small naturals involved: `fromIntegral x` is small because `x` is a `Word`, and `2` is small. We could hope that GHC would use `Word#` primops to implement this and would allocate a `Natural` heap object for the result only. However it’s not what happens currently, even in GHC HEAD. In the following STG dump, we can see that a `Natural` heap object is allocated for `x` before calling `naturalAdd` (`let` bindings in STG reflect heap allocations):

``````foo1 = NS! [2##];

foo =
\r [x_sXn]
case x_sXn of {
W# x#_sXp ->
let { sat_sXq = NS! [x#_sXp]; } in  naturalAdd sat_sXq foo1;
};``````

Let’s look at `naturalAdd`:

``````-- | Add two naturals
naturalAdd :: Natural -> Natural -> Natural
naturalAdd (NS x) (NS y) =
(# l,0# #) -> NS l
(# l,c  #) -> NB (bigNatFromWord2# (int2Word# c) l)``````

We are clearly in the last case where both arguments are small. It seems beneficial to allow this function to be inlined. If we did we would get:

``````foo =
\r [x_s158]
case x_s158 of {
W# x#_s15a ->
case addWordC# [x#_s15a 2##] of {
(#,#) l_s15c ds_s15d ->
case ds_s15d<TagProper> of ds1_s15e {
__DEFAULT ->
case int2Word# [ds1_s15e] of sat_s15f {
__DEFAULT ->
case bigNatFromWord2# sat_s15f l_s15c of ds2_s15g {
__DEFAULT -> NB [ds2_s15g];
};
};
0# -> NS [l_s15c];
};
};
};``````

which produces much better assembly code, especially if there is no carry:

``````    addq \$2,%rax       ; add 2 to a machine word
setc %bl           ; test the carry.
movzbl %bl,%ebx    ; it could be done
testq %rbx,%rbx    ; more efficiently
jne _blk_c17c      ; with "jc"
_blk_c17i:
movq \$NS_con_info,-8(%r12) ; alloc NS datacon value
leaq -7(%r12),%rbx         ; make it the first argument
jmp *(%rbp)                ; call continuation
...``````

So why aren’t we always inlining `naturalAdd`? We even explicitly disallow it with a `NOINLINE` pragma. The reason is that `naturalAdd` and friends are involved in constant-folding rules.

For example, consider:

``````bar :: Natural -> Natural
bar x = x + 2

baz = bar 0x12345678913245678912345679123456798``````

Currently we get the following Core:

``````bar1 = NS 2##

bar = \ x_aHU -> naturalAdd x_aHU bar1

baz = NB 99114423092485377935703335253042771879834``````

You can see that `baz` is a constant thanks to constant-folding.

However if we let `naturalAdd` inline we get:

``````baz
of ds_d11H
{ __DEFAULT ->
NB ds_d11H
}``````

`baz` is no longer a constant.

A solution would be to add constant-folding rules for `BigNat#` functions, such as `bigNatAddWord#`. This is exactly what we have started doing in #20361. Our new plan is:

• Make `BigNat#` operation `NOINLINE` and add constant-folding rules for them
• Make Integer/Natural operations `INLINEABLE` (expose their unfolding)
• Hence rely on constant-folding for `Word#/Int#/BigNat#` to provide constant folding for `Integer` and `Natural`

The good consequences of this plan are:

• Less allocations when bignum operations are inlined and some of the arguments are known to be small/big or fully known (constant).
• `Integer` and `Natural` are less magical: you can implement your own similar types and expect the same performance without having to add new rewrite rules

There were some unforeseen difficulties with this plan though:

1. Some of the rewrite rules we need involve unboxed values such as `BigNat#` and `Word#` and the weren’t supported. Luckily, this has been recently fixed (#19313) by removing the “app invariant” (#20554). Thanks Joachim! That’s the reason why we could resume this work now.
2. Some unfoldings (RHSs) become bigger due to the inlining of bignum operations. Hence they may not themselves be inlined further due to inlining thresholds even if it would be beneficial. A better inlining heuristic would fix this (see #20516). It will likely be the topic of the next post.