Contributions to GHC 9.0
Tags: GHC, Haskell March 13, 2021

Changelog

  • 2021-03-15: discussed on reddit

This is my GHC activities report for GHC 9.0.

If I’ve got the following script right, I’ve made 225 commits for the GHC 9.0 series at the time of writing.

Module hierarchy

I have completed the renaming of every GHC module to introduce a hierarchy.

Haddock for the GHC library before and after the changes:

image

You might think: “So what? You only renamed a few modules”. But in fact it wasn’t that simple as I had neglected the amount of discussion that this kind of change would trigger.

First attempt: I did the work once in June 2017 in a single big patch that was very difficult to review and to rebase and consequently that has never been merged. I have been asked to write a ghc-proposal which has then been flagged out-of-scope (quite demoralizingly). I have implemented ghc-api-compat, a package to help compiling codes using the old naming (sadly the latest version can’t be uploaded on Hackage because of Cabal bug #4629). So I finally gave up in December 2017.

After enough time had passed to forget about this, I made a second attempt in 2019. This time with much smaller patches and waiting for one patch to be merged before starting working on the other to avoid too much rebasing work. As the process took longer than expected (it ended in 2020), the refactoring wasn’t completed for the 8.10 release. That’s why 8.10 has only some hierarchy (e.g. GHC.Hs and GHC.StgToCmm).

The whole story is tracked in #13009.

This change mostly matters if you are a GHC developer or a user of the GHC API because that’s what you interact directly with. But doing this was only a first step before starting to modularise GHC’s codebase. The latter is still a lot of (functional) spaghetti code but it’s getting better. Now we can try to enforce some rules such as “no, every part of the compiler shouldn’t directly use the driver state (HscEnv)”, etc.

Thanks to Takenobu Tani for fixing many references after these changes (in code comments, in the Wiki, etc.). Thanks to Richard Eisenberg and Simon Peyton Jones for their support.

Users guide

I found GHC’s users guide quite difficult to navigate especially because of its theme: the table of contents on the right was often useless because we had to scroll up a lot to find it. So I’ve switched to the now very common ReadTheDocs theme which has the TOC always visible on the left.

I had to adapt the CSS of the theme to try to match the old theme in some cases. It’s far from being perfect so anyone with CSS skills should feel free to enhance it.

While doing this change I noticed that the GHC Language Features page was huge and slow to render. So I’ve done the tedious work of splitting it. By doing this I fixed issue #17316 that I had missed before and where other devs had been discussing doing this. It looks better now but I have done the split my own way (I don’t even remember exactly which criteria I’ve followed, it was more than a year ago). So don’t hesitate to propose changes to enhance this.

ghc-bignum

In August 2019 I started toying with an implementation of Natural numbers. When I started working for IOHK in September 2019 it turned out it would be useful to replace the previous Haskell native implementation (integer-simple) with a faster one for cross-compilation concerns. So I worked on it and ghc-bignum was merged in June 2020.

We published a post about this work in July 2020 on IOHK blog: https://iohk.io/en/blog/posts/2020/07/28/improving-haskells-big-numbers-support/

I have implemented only simple algorithms (division was tricky enough to implement) but performance is pretty good, especially compared to integer-simple. The following chart from the blog post linked above compares performance results of some basic operations (note the log scale):

image

Providing a new native Haskell implementation of Integer/Natural was only part of the project. ghc-bignum also refactors the way support for big numbers is implemented. Instead of two separate packages (integer-gmp or integer-simple), it’s now always ghc-bignum but built with different Cabal flags to select the backend. This simple change allows other packages to depend on ghc-bignum unconditionally starting with GHC 9.0.

Note that a integer-gmp package is still distributed for backward compatibility, but it relies on ghc-bignum.

Implementing ghc-bignum led me to learn about and to modify both build systems of GHC, something that you don’t want to have to do. It also uncovered tricky issues like raising exceptions defined in base package from a package that base depends on…

Thanks a lot to early testers who uncovered bugs in ghc-bignum! Implementing this library led me to learn more about program proofs. A stupid error in such library could easily go unnoticed and then any user of such numeric code could compute some wrong results: it’s quite scary!

By the way, you shouldn’t use GHC 9.0.1 in production because of a silly bug I’ve introduced. My patch implementing constant folding for Natural numbers has been merged only in the final release of GHC 9.0.1 and sadly it introduced a bogus rule. Constant folding rules in GHC are the second reason for me to learn about program transformation proofs.

Modularisation

Finally most of my time has been spent in modularising the GHC codebase. Or more precisely, disentangling the codebase (modularisation should come later).

If you have ever used the ghc-api or looked at the codebase, your first reaction has probably been “WTF?!”. In particular, you must have encountered the most infamous datatype in the compiler: DynFlags. Despite its name, it contains a lot of random stuff. My guess is that as there was always a DynFlags in scope, it ended up being used as a Reader monad state with some IORefs in it. And when it wasn’t in scope… well it always was because there was a unsafeGlobalDynFlags global variable too.

As we were (and still are) working towards making GHC multi-target (currently a ghc executable can only produce code for a single target, e.g. x86-64-linux), it became necessary to untangle all this.

For example, before, the compiler could assume that there was only a single platform: the target one. So it just had to query DynFlags.targetPlatform everywhere. Now we have to pass the platform as a parameter to the appropriate functions. That’s why many functions that used to take a DynFlags parameter now take a Platform parameter. See this commit for an example of such huge patches.

As my current objective is to implement support for compiler plugins (i.e. Haskell code for the host) in cross-compilers (i.e. GHC compiler producing codes for a target different of the host, e.g. JavaScript), I have also made a lot of changes to the code handling units (compiled packages).

#17957 is the ticket that I use to track generic refactorings. #14335 is more specific to compiler plugins. There is still a lot of work to do in both cases.

Thanks to John Ericson who has made and still makes a lot of similar refactorings.

Conclusion

If you are a GHC API user, there is 100% chance that I contributed to break your code with this release. Hopefully it’s for the best in the future so please don’t be too mad!

Don’t forget to wait for GHC 9.0.2 before using GHC 9.0 in production.

I have done a few other random things; see the commit list below.

Appendix: commit list

# ghc-bignum

f5e5a1febf Bignum: fix bogus rewrite rule (#19345)
ba089952f0 Bignum: add Natural constant folding rules (#15821)
f4b11d23fe Bignum: add clamping naturalToWord (fix #18697)
c6851770e7 Bignum: fix for Integer/Natural Ord instances
bba8f79c7e Bignum: make GMP's bignat_add not recursive
d09e7e41cf Bignum: fix bigNatCompareWord# bug (#18813)
a740aa0bfc Bignum: match on small Integer/Natural
5d414fdc01 Bignum: implement integerPowMod (#18427)
175d714126 Bignum: implement integerRecipMod (#18427)
89a001505a Bignum: add integerNegate RULE
ebcc09687b Bignum: add bigNatFromWordArray
74f3f581dd Bignum: implement extended GCD (#18427)
6c98a930ee Bignum: refactor backend modules
c23275f4dd Bignum: add missing compat import/export functions
d5c3a027ec Bignum: add BigNat compat functions (#18613)
bf8bb9e785 Bignum: fix BigNat subtraction (#18604)
3745bdb69b Bignum: add more BigNat compat functions in integer-gmp
eab2511ed2 Bignum: add backward compat integer-gmp functions
817f94f51a Bignum: fix powMod for gmp backend (#18515)
b4cccab3cc Fix bug in Natural multiplication (fix #18509)
324967891a Bignum: add support for negative shifts (fix #18499)
929d26db30 Bignum: don't build ghc-bignum with stage0
d3bd689784 BigNum: rename BigNat types
1b3d13b68c Fix ghc-bignum exceptions
bccf3351a2 Add ghc-bignum to 8.12 release notes
a403eb917b ghc-bignum: fix division by zero (#18359)
9f96bc127d ghc-bignum library
fa4281d672 Bump bytestring and text submodules
dceecb093c Update Hadrian
f817d816e6 Update testsuite
aa9e7b7196 Update `make` based build system
0f67e3447e Update `base` package
96aa57878f Update compiler
57db91d8ee Remove integer-simple
f82a2f90ce Document GMP build [skip ci]
a3e907630f Fix documentation and fix "check" bignum backend (#18604)
a589fb949c Natural: fix left shift of 0 (fix #19170)
558d4d4a2b Split integerGmpInternals test in several parts
581753790d Hadrian: minor GMP refactoring

# Build system

faa36e5b36 Hadrian: ignore in-tree GMP objects with ``--lint``
c3c801e3c3 Replace more autotools obsolete macros (#19189)
062b3a7e8b touchy: use a valid cabal-version
bea3df2e85 Fix Windows build with autoconf >=2.70 (#19189)
0789f7a18a Hadrian: show default ghc-bignum backend (fix #18912)
d1de5c2271 Use Hadrian by default in validate script (#17527)
d25b6851bb Hadrian: ghc-gmp.h shouldn't be a compiler dependency
23e4e04700 Hadrian: fix PowerPC64le support (#17601)
b420fb2474 Hadrian: fix hp2ps error during cross-compilation
15ccca16e2 Hadrian: fix distDir per stage
7a07aa7181 Hadrian: fix cross-compiler build (#16051)
95da76c2b9 Hadrian: fix binary-dist target for cross-compilation
85fc32f03a Hadrian: fix dyn_o/dyn_hi rule (#17534)
75a185dc2a Hadrian: fix --summary
729bcb0271 Hadrian: fix build on Mac OS Catalina (#17798)
b989845e37 Hadrian: fix absolute buildroot support (#17822)
ee2c50cbee Hadrian: track missing configure results
2a2f51d79f Use configure script to detect that we should use in-tree GMP on Windows
34c7d23074 Fix Hadrian's ``--configure`` (fix #17883)
9d09411122 Hadrian: `docs` rule needs `configure` (#17840)
e2cce99732 Hadrian: fix source-dist target (#17849)
3cea67955a Make: fix sdist target (#17848)
d7029cc09e Hadrian: refactor GMP in-tree build support (#17756)
7550417ac8 Hadrian: drop Sphinx flag checking for PDF documentation (#17825)
9f2c3677b3 GMP expects the Target platform as --host parameter
1bfd825943 Ensure that Hadrian is built correctly before using it
414e2f6263 Force -fPIC for intree GMP (fix #17799)
4f11713567 Make: refactor GMP rules

# Numeric stuff (preliminaries to ghc-bignum)

fdcc53babb Optimise genericIntMul2Op
7a51b587ad Add constant folding rule (#16402)
35afe4f3b1 Use Int# primops in `Bits Int{8,16,32,64}` instances
fbbe18a274 Use the new timesInt2# primop in integer-gmp (#9431)
5f7cb423d7 Add `timesInt2#` primop
3656dff825 LLVM: fix MO_S_Mul2 support (#18434)
b5768cce02 Don't use timesInt2# with GHC < 8.11 (fix #18358)
1cca12edaa Constant-folding: don't pass through GHC's Int/Word (fix #11704)
aba51b6586 Add arithmetic exception primops (#14664)

# Module hierarchy

20800b9a9e Split GHC.Iface.Utils module
a426abb9b4 Rename GHC.Hs.Types into GHC.Hs.Type
37430251c3 Rename GHC.Core.Arity into GHC.Core.Opt.Arity
528df8ecb4 Modules: Core operations (#13009)
18a346a4b5 Modules: Core (#13009)
1941ef4f05 Modules: Types (#13009)
255418da5d Modules: type-checker (#13009)
15312bbb53 Modules (#13009)
af33244212 Modules: Utils and Data (#13009)
1500f0898e Modules: Llvm (#13009)
240f5bf6f5 Modules: Driver (#13009)
817f93eac4 Modules: Core (#13009)
1b1067d14b Modules: CmmToAsm (#13009)
cf739945b8 Module hierarchy: HsToCore (cf #13009)
da7f74797e Module hierarchy: ByteCode and Runtime (cf #13009)
6e2d9ee25b Module hierarchy: Cmm (cf #13009)
d491a6795d Module hierarchy: Renamer (cf #13009)
99a9f51bf8 Module hierarchy: Iface (cf #13009)
eb6082358c Module hierarchy (#13009): Stg

# Users guide

0a5e4f5f7d Split glasgow_exts into several files (#17316)
aeea92ef6f Switch to ReadTheDocs theme for the user-guide

# Modularisation

ce5408c062 Replace ghcWithNativeCodeGen with a proper Backend datatype
666acbd43d Correctly test active backend
e27698ce0b Remove unused sGhcWithNativeCodeGen
dbf77b79d9 Remove unused "ncg" flag
4e22de2a6f Don't panic if the NCG isn't built (it is always built)
30caeee751 DynFlags: remove use of sdocWithDynFlags from GHC.Stg.* (#17957)
0ddae2ba97 DynFlags: factor out pprUnitId from "Outputable UnitId" instance
41d2649288 DynFlags: avoid the use of sdocWithDynFlags in GHC.Core.Rules (#17957)
f08d6316d3 Replace Opt_SccProfilingOn flag with sccProfilingEnabled helper function
3cdd8d69f5 NCG: correctly handle addresses with huge offsets (#15570)
a04020b88d DynFlags: don't store buildTag
7ad4085c22 Fix invalid printf format
cad62ef119 Add tests for #17920
5f6a066551 LLVM: refactor and comment register padding code (#17920)
2636794d1a CmmToC: don't add extern decl to parsed Cmm data
7750bd456f Cmm: introduce SAVE_REGS/RESTORE_REGS
d4a0be7580 Move tablesNextToCode field into Platform
2af0ec9059 DynFlags: store default depth in SDocContext (#17957)
eb8115a8c4 Move CLabel assertions into smart constructors (#17957)
456e17f035 Bump haddock submodule and allow metric decrease
c10ff7e7e5 Doc: fix some comments
1fbb4bf5f3 NCGConfig: remove useless ncgUnitId field
bfd0a78cdd Don't return preload units when we set DyNFlags
ac964c8350 Put database cache in UnitConfig
28d804e1e1 Create helper upd_wired_in_home_instantiations
4274688a63 Move distrustAll into mkUnitState
fca2d25ff7 DynFlags: add UnitConfig datatype
8408d521a6 DynFlags: merge_databases
a444d01bc9 DynFlags: reportCycles, reportUnusable
42c054f6cd DynFlags: findWiredInUnits
4b53aac1e2 Refactor and document closeUnitDeps
5226da3784 Refactor and document add_package
36e1daf0a6 DynFlags: make listVisibleModuleNames take a UnitState
bd5810dc4e DynFlags: remove useless add_package parameter
9e715c1b84 Document getPreloadUnitsAnd
266bc3d9c3 DynFlags: refactor unwireUnit
9400aa9348 Remove preload parameter of mkUnitState
598cc1dde5 Move wiring of homeUnitInstantiations outside of mkUnitState
ae900605c4 Move dump_mod_map into initUnits
653d17bdd5 Rename Package into Unit (2)
55b4263e1a Remove ClosureUnitInfoMap
202728e529 Make ClosureUnitInfoMap uses UnitInfoMap
ed533ec217 Rename Package into Unit
f50c19b8a7 Rename listUnitInfoMap into listUnitInfo
d2109b4f10 Remove PreloadUnitId type alias
3d171cd6d5 Document and refactor `mkUnit` and `mkUnitInfoMap`
d345edfe96 Refactor WiredMap
9c5572cd29 Remove LinkerUnitId type alias
e7272d53e6 Enhance UnitId use
f6be6e432e Add allowVirtualUnits field in PackageState
8dc71f5577 Rename unsafeGetUnitInfo into unsafeLookupUnit
72d086106d Refactor homeUnit
7a02599afe Remove unused code
2517a51c0f DynFlags refactoring VIII (#17957)
cf772f19c0 Enhance Note [About units] for Backpack
1c91a7a095 Bump haddock submodule
a0ea59d641 Move Config module into GHC.Settings
566cc73f46 Move isDynLinkName into GHC.Types.Name
94e7c563ab Don't use DynFlags in showLinkerState (#17957)
cab1871ab9 Move LeadingUnderscore into Platform (#17957)
40c71c2cf3 Fix colorized error messages (#18128)
1d8f80cd64 Remove references to -package-key
de9fc995c2 Fully remove PprDebug
b3df9e780f Remove PprStyle param of logging actions
f8386c7b6a Refactor PprDebug handling
780de9e110 Use platform in Iface Binary
8bfb021958 Unit: split and rename modules
10d15f1ec4 Refactoring unit management code
ea717aa424 Factorize mungePackagePaths code
9e2c8e0e37 Refactor UnitInfo load/store from databases
69562e34fb Remove unused `emptyGenericUnitInfo`
10a2ba90aa Refactor UnitInfo
2cfc4ab971 Document backpack fields in DynFlags
747093b7c2 CmmToAsm DynFlags refactoring (#17957)
f2a98996e7 Avoid `sdocWithDynFlags` in `pprCLbl` (#17957)
ce5c2999d2 Avoid using sdocWithDynFlags (#17957)
35e43d48a9 Avoid DynFlags in Ppr code (#17957)
70be0fbcef GHC.Runtime: avoid DynFlags (#17957)
6655f93324 Use ParserFlags in GHC.Runtime.Eval (#17957)
3ca5215188 GHC.Core.Opt renaming
cc2918a040 Refactor CmmStatics
a485c3c404 Move blob handling into StgToCmm
f7597aa0c0 Testsuite: measure compiler stats for T16190
4980200255 Update Stack resolver for hadrian/build-stack
0002db1bf4 Kill wORDS_BIGENDIAN and replace it with platformByteOrder (#17957)
e54500c12d Store ComponentId details
f1a6c73d01 Merge GHC.Types.CostCentre.Init into GHC.Driver.CodeOutput
1c7c6f1afc Remove GHC.Types.Unique.Map module
0de03cd787 DynFlags refactoring III
64f2075669 Refactoring: use Platform instead of DynFlags when possible
2e82465fff Refactor CmmToAsm (disentangle DynFlags)
44fad4a925 Rename isDllName
a698997137 Use a Set to represent Ways
bc41e47123 Refactor interpreterDynamic and interpreterProfiled
8e6febcee4 Refactor GHC.Driver.Session (Ways and Flags)
18757cab04 Refactor runtime interpreter code
6880d6aa1e Disentangle DynFlags and SDoc
fa28ae95e4 Fix flag documentation (#17826)
16d643cfe6 Remove -ddump-srts flag
fb5c19122e Remove redundant case
29c701c154 Refactor package related code
bf38a20eef Call `interpretPackageEnv` from `setSessionDynFlags`
c8636da514 Fix LANG=C for readelf invocation in T14999
1ca9adbc88 Remove `parallel` check from configure.ac

# Misc

f307ed226f Revert "Remove SpecConstrAnnotation (#13681)" (#19168)
753fecd273 Display FFI labels (fix #18539)
c94c56d5a7 Export SPEC from GHC.Exts (#13681)
6dbd105421 Remove outdated note
fb544de7cd Fix parsing of PIE flags
6653e139db Fix minimal imports dump for boot files (fix #18497)
d3c2d59baf RTS: avoid overflow on 32-bit arch (#18375)
edc8d22b2e LLVM: support R9 and R10 registers
8d07c48ce3 test: fix conc038
951c1fb03d Fix unboxed-sums GC ptr-slot rubbish value (#17791)
58655b9da7 Add GHC-API logging hooks
d561c8f624 Add Cmm related hooks
40327b037f Remove outdated comment
0c114c6599 Handle large ARR_WORDS in heap census (fix #17572)
7bc3a65b46 Remove SpecConstrAnnotation (#13681)
f024b6e385 Expect T4267 to pass
dce50062e3 Rts: show errno on failure (#18033)
3445b96526 Only test T16190 with the NCG
0639dc10e2 T16190: only measure bytes_allocated
7432b327a3 Use correct option name (-opti) (fix #17314)
437265eb26 Avoid timing module map dump in initUnits
99823ed24b TH: fix Show/Eq/Ord instances for Bytes (#16457)
cd4434c8e7 Fix misleading Ptr phantom type in SerializedCompact (#15653)
8ea37b01b6 RTS: workaround a Linux kernel bug in timerfd