About GHC's stability in 2024
Tags: GHC, Haskell August 7, 2024

Changelog


There are lots of discussions about GHC’s “stability”, making base “reinstallable”, and similar topics. As we’ve been harping on about these topics for years now, I figured I would try to summarize the issues and the current status here.

What are the issues?

A compiler LC for a language L takes some source code SC written in L (hopefully) and generates object code OC for a target platform T. The compiler runs on an host platform H.

We’ll use and abuse of the following notation to summarize this:

SC(L) --LC(H)--> OC(T)

The issue is that GHC is different from this in two aspects:

  1. L isn’t independent of LC
  2. OC(T) must be linkable with LC(H) (by default)
    • This implies some constraints on L too, so double on (1)

Let’s discuss this more.

L isn’t independent of LC

A new version of GHC supports only a “new” Haskell language. There may be backward incompatible changes in the compiler (e.g. to the way GHC typechecks programs), but these are rare. What’s more frequent is that the standard API offered to Haskell programs (i.e. boot libraries such as base) changes with the compiler.

As a result, a program compiling with GHC version N may not compile with GHC version N+1. It’s as if the language was versionned, as illustrated below.

We have some source code SC compiling with LC-1.0:

SC(L-1.0) --LC-1.0(H)--> OC(T)

A new compiler is released. Let's try to use it!

SC(L-1.0) --LC-2.0(H)--> FAILURE

We need to tweak the source code to make it compilable with LC-2.0:

SC-2.0(L-2.0) --LC-2.0(H)--> OC(T)

The issue is: we can’t use a new compiler without modifying the source code we want to compile. As a result:

  • we can’t know if regressions arise from code changes or compiler upgrade.
  • as changing code can be costly (all the dependencies need to be fixed in cascade…), some code bases are stuck with old compiler versions and new compilers get less testing than we would hope for.

The project to fix this is referred to as reinstallable base. The idea is to decouple the base library from the GHC compiler used:

  • Let’s call ghc-base the standard library provided by GHC and versioned with it.
    • Note: currently ghc-base is split into ghc-prim, ghc-bignum, and ghc-internal.
  • ghc-base changes with every GHC release to add/remove/rename/deprecate/modify stuff

  • base abstracts over changes in ghc-base for some releases, with some overlap to allow incremental updates: e.g.

base-1.0 depends on ghc-base >= 8.10 && < 9.8
base-2.0 depends on ghc-base >= 9.6 && < 9.10
base-3.0 depends on ghc-base >= 9.8 && < 9.12

When a new ghc-base is released, existing base releases may be adapted to support it or not:

base-1.0 depends on ghc-base >= 8.10 && < 9.8
base-2.0 depends on ghc-base >= 9.6 && < 9.10
base-3.0.1 depends on ghc-base >= 9.8 && < 9.14

It’s up to base maintainers to decide how many releases they want to support and when to make major breaking releases. But:

  • base support for ghc-base should overlap to allow bumping base without bumping ghc-base, and vice-versa.
  • as GHC depends on base, there should always be a base version supporting the development version of ghc-base so that it can bootstrap itself (maybe with --allow-newer set only for ghc-base when compiling base for GHC).

Now if we have some source code depending on base (and not ghc-base):

  • we can switch to a new compiler (i.e. bumping ghc-base too) without modifying the code, as long as the base version used supports the ghc-base version provided by the new compiler.

  • we can switch to a new version of base without switching to a new compiler, as long as the new base version used support the ghc-base version provided by the current compiler.

Template Haskell

ghc-base contains the data types expected by the compiler for the abstract syntax of Template Haskell.

So, similarly to base, the template-haskell package also needs to be versionned to support several ghc-base versions so that packages using template-haskell aren’t tied to a specific ghc-base version through their template-haskell dependency:

template-haskell-1.0 depends on ghc-base >= 8.10 && < 9.4
template-haskell-2.0 depends on ghc-base >= 9.2 && < 9.10
template-haskell-3.0 depends on ghc-base >= 9.8 && < 9.12
...

Note: should template-haskell be merged into base to make dependencies simpler? [Edit: Teofil Camarasu explained to me that template-haskell has dependencies that base doesn’t have, so that’s probably a no-go]

Status of the “reinstallable base” project

  • base has been split in two parts: ghc-internal and base
  • wired-in entities have been moved from template-haskell to ghc-internal
  • technical changes have been made: removal of wired unit-ids, fix of cabal knowledge about reinstallable packages, hadrian support for 2 versions of template-haskell, etc.
  • changes to base interface (exports…) are tracked on CI and aren’t done lightly anymore (they require an accepted CLC proposal)
  • I’ve suggested merging ghc-prim, ghc-bignum, ghc-internal into a single package in #24453. Not done yet.
  • Probably more. I’m not involved much in this project.

Note that contrary to the model exposed above, there is still a single version of base and template-haskell actively maintained for now.

What about “reinstallable ghc-base”?

We may want to reinstall ghc-base (and the RTS) too! For example to change the bignum backend used to back Integer operations (native vs GMP).

Doing this should be possible. As it’s similar to performing cross-compilation, it’s discussed in the relevant section below.

OC(T) must be linkable with LC(H)

In plain english: the object code produced by the compiler must be loadable/linkable with the running compiler.

This is required by the internal interpreter used to implement:

  • Template Haskell
  • Annotations
  • Plugins
  • (GHCi; we don’t really care for this here as it’s not used to compile code)

Here is how the internal interpreter executes code at compilation time:

  • GHC compiles the code it wants to run into object code (OC(T))
  • it loads/links the object code into its process (e.g. with dlopen when GHC is dynamically linked)
  • it lookups the symbol of the function it wants to call in the loaded object code
  • it unsafe-coerces the function pointer into a proper Haskell function type
  • it calls the function normally

It is safe to unsafe-coerce here because values live in the same heap (the one of the GHC process).

Note: we ignore ByteCode here as the same issue would arise anyway with ByteCode calling into native code (FFI calls, precompiled boot libraries without ByteCode, etc.).

Note: called functions must be pure because if GHC is linked statically with libFOO, then loading some code depending on libFOO will load libFOO again without linking global variables of both libFOO instances together. This was an issue for code depending on the ghc library which used global variables unsafely (now these global variables are properly shared via the RTS).

“OC(T) must be linkable with LC(H)” is a major constraint because it means the object code must have the same ABI as the compiler.

It is however necessary to work around it:

  • to build new GHCs
  • to support GHC ways
  • to support cross-compilation

How do we build a new GHC?

Suppose we want to build GHC-2.0 using the GHC-1.0 program and that they don’t have the same ABI. We use a two-step process as follows:

SC-GHC-2.0 --GHC-1.0(ABI-1.0)--> GHC-2.0(ABI-1.0)

And then:

SC-GHC-2.0 --GHC-2.0(ABI-1.0)--> GHC-2.0(ABI-2.0)

After the two-step process, GHC-2.0(ABI-2.0) has ABI-2.0 and produces object code with ABI-2.0, so it can load it. All good.

Note however that the second step shouldn’t be allowed because of our constraint “OC(T) must be linkable with LC(H)”, i.e. “GHC-2.0(ABI-2.0) must be linkable with GHC-2.0(ABI-1.0)”: the ABIs don’t match! We are only allowed to do this because GHC’s source code doesn’t use any of the features above (TH, annotations, plugins)! This is a constraint but it’s less inconvenient than not being able to modify the ABI ever.

Note: GHC’s newer source code (SC-GHC-2.0) must be compilable by two different GHCs: an old GHC (e.g. GHC-1.0) and the new GHC itself (GHC-2.0). That’s a reason why we are reluctant to add new dependencies to GHC because then we have to adapt them ourselves for them to be usable with the newer GHC. If GHC becomes more stable, then maybe we can start adding more dependencies.

What about cross-compilation and compiler ways?

Remember our illustration:

SC(L) --LC(H)--> OC(T)

With cross-compilation, the compiler host platform H is really different from the object code target platform T. For example, if H is linux-x86-64 and T is javascript then we really can’t link JavaScript “object code” with Linux x86-64 ELF object code.

The same is true for GHC’s “ways”. For example when profiling is enabled, the compiler generates object code with a different ABI. E.g. a heap object for an Int has 1 more field when profiling is enabled so we can’t unsafe-coerce it into an Int expected by a non-profiled compiler without risking segfaults.

External interpreter

This is where the external interpreter is useful. If we have another interpreter process that is built using the target ABI T, then this interpreter can load the object code OC(T) produced by our cross-compiler. The compiler and this external interpreter then just have to exchange properly serialized data (without unsafe coercions) over a pipe or a socket.

  1. SC(L) --LC(H)--> OC(T)
  2. LC(H) spawns an external interpreter EI(T) for the T target
  3. LC(H) asks EI(T) to load OC(T) (and its dependencies)
  4. LC(H) asks EI(T) to run function foo and to return its result serialized (e.g. a Template Haskell abstract syntax tree).

Note: step 2 assumes that EI(T) can be spawned on the compiler host platform H. If it isn’t the case, it is possible to spawn a proxy interpreter that communicates with EI(T) spawned on the appropriate platform (e.g. in qemu, wine, or on a remote server). See iserv-proxy.

Who provides the EI(T) external interpreter program?

  • GHC distributes some for the ways it has been built to support (profiling, etc.)
  • haskell.nix supports even more of them, even automatically using qemu and wine.
  • I’ve suggested that they should be built on-demand by GHC (#24731) to avoid the need to distribute them (still to be done).

The external interpreter has a better/safer design than the internal interpreter: why isn’t it the default?

  • It’s slower: it needs to spawn another process with its own RTS (GC…) and to serialize data for communication. It’s difficult to compete with the internal interpreter that uses the same process/GC and that (unsafe) coerces values at zero cost.

  • Plugins aren’t supported by the external interpreter!

Why aren’t plugins supported by the external interpreter?

In theory, plugins could be supported by the external interpreter: a plugin provides some functions that GHC could call remotely, just as it does for Template Haskell. BUT:

  • GHC uses mutable global state (e.g. for the FastString table and for the Unique counter): the external interpreter process would have to query the compiler process every time it wants to read or modify this state. The code would need to be refactored to ensure that a remote command is sent instead of using global variables in the external interpreter itself.

  • Plugins have access to the whole compiler state (e.g. via the HscEnv datatype) and it contains mutable variables (MVar)… We would need to be able to serialize these data types to send them back and forth between the compiler and the external interpreter.

A better alternative would be to refine the plugin interface to only allow access to easily serializable types. This requires the design of a proper protocole instead of assuming that GHC internals are fully accessible from a plugin. This would be a massive undertaking that would break every plugin, so it’s unlikely to happen.

Hey, couldn’t we run plugins with the internal interpreter and the rest with the external interpreter?

In theory maybe, in practice no.

A GHC process isn’t multi-target: it assumes that there is a single target platform, a single interpreter, a single loader state, etc.

Package databases are somewhat distinct between packages for plugins and target packages, but not always. Sometimes it’s on purpose (the same packages are expected to be used for the plugins and for the target code) and sometimes not (we reported some occasional leaks between plugins and target code in our paper).

Also, cabal doesn’t provide an interface to declare plugin dependencies separately from target code dependencies (see cabal#2965). So the GHC distinction between plugin packages and target packages is probably never used, except in GHC’s testsuite.

I’ve spent countless hours refactoring GHC’s code to make progress towards supporting several targets in the same process but we’re still far from supporting it. There are many open questions:

  • plugins may depend on different wired-in packages than target code. How do we deal with this internally?
  • plugins are allowed in the home-unit (the library we’re building): should we build the library twice? (for plugins then for the target)
    • where do we look for the plugin object code in one-shot mode?
  • what about multiple home-units?

We’re quickly reinventing cabal inside GHC just for plugins… As I was getting nowhere with the refactoring, I’ve switched to another approach as a workaround: -fplugin-library.

What’s the -fplugin-library workaround?

The idea is to totally bypass GHC’s compilation pipeline for plugins and to load them directly from .so/.dll libraries. It requires two compilers:

  • the one used to build plugins (GHC(H,H): runs on H and targets H)
  • the one used to generate target code (GHC(H,T): runs on H and targets T).

Both compilers are used like this:

SC-Plugin --GHC(H,H)--> OC-Plugin(H) (e.g. myplugin.so)

SC(L) --{GHC(H,T) -fplugin-library=myplugin.so;... }--> OC(T)

Issues are:

  • we need to manage two GHCs; the first one must produce code ABI compatible with the second for plugins
  • plugins need to be built manually before calling the GHC process using them.
  • plugins loaded with the usual -fplugin flag aren’t automatically supported.

Perhaps GHC could spawn another GHC process to build its plugins and then load them as with -fplugin-library. That would work around the isolation issue without having to refactor the compiler code.

Could we make GHC multi-target instead of using two separate GHCs?

Instead of managing two GHCs or more (one for plugins that will be loaded with -fplugin-library by other GHCs, some more for the targets), we could use a single ghc program to which we pass different flags to select the target.

It is already possible as we’ve made GHC target agnostic: we just have to pass a different “top-dir” with the -B flag to select different settings and boot packages.

We could imagine:

  • GHC being aware of the -B flag required to build plugins. Then it would invoke itself to build them with the proper settings/packages.

  • a tool with a nice UI that would be used to generate fresh “top-dirs” (settings, boot packages…) for different targets.

The second point would also allow the “reinstallation” of ghc-base mentioned earlier. Using a different ghc-base would be similar to cross-compiling to a different platform: we can’t link the generated code with the compiler because of the discrepancies between the ghc-base used to build the compiler and ghc-base used to build the target code. Similarly, we could “reinstall” all the boot packages: rts, directory, unix, base… Just like what Hadrian does.

Note: plugins depend on the ghc library (ghc-lib), which itself depends on third-party libraries (directory, unix, base…) so they would still be forced to use the libraries used to build GHC itself. This is a real issue. Someone reported exactly this just while I was writing this post:

I am trying to write a plugin but am having the problem where the ghc package want transformers 0.5.6.2 but the package I want the plugin to modify requires 0.6.1.1.

Since, the compiled plugin objects do not intermix with the compiled project objects it seems like this should not actually be a conflict. Is there a way to resolve this?

Plugins/TH status

  • -fplugin-library is implemented since 9.6.
  • It requires compilers to be stage2, which cross-compilers aren’t by default yet
    • there is some work in progress to fix this: #19174
  • Automatic build of the external interpreter is what the JavaScript backend does (approximately)
    • Generalization to other targets is still to do: #24731
  • ghc-toolchain is a new tool that should give the nice UI to generate new target “top-dir”
    • I’m not fully aware of its status and roadmap

Ultimately it would be nice to only distribute:

  • programs: ghc, ghc-pkg, ghc-toolchain
  • libraries for plugins: ghc-base, ghc-lib
  • settings for plugins

and to let ghc-toolchain (maybe called via cabal) initialize for every target:

  • the settings (toolchain to use, etc.)
  • the boot packages (ghc-base, base, template-haskell, ghci…)
  • an external interpreter