Fast file embedding with GHC!
Tags: GHC, Haskell, Haskus January 15, 2019

If like me you sometimes want to embed resources files into executable binaries generated by GHC, you may have already used file-embed package. We are not alone as it is directly used by 101 other packages on Hackage at the time of writing.

The problem is: compile time and memory usage are awful for “big” files, where “big” = a few megabytes.

Here is the result of a small benchmark I wrote showing the time needed to compile an Haskell source file that embeds a zero-filled data file of the given size with file-embed:

================================
| Benchmarking file-embed      |
================================
Warming up...
# Benchmarking size: 128b
real    0m1,394s
user    0m1,191s
sys     0m0,315s

# Benchmarking size: 3K
real    0m1,259s
user    0m1,060s
sys     0m0,269s

# Benchmarking size: 3M
real    0m7,611s
user    0m6,510s
sys     0m1,183s

# Benchmarking size: 15M
real    0m32,582s
user    0m28,553s
sys     0m3,866s

# Benchmarking size: 150M
... I had to kill the process as GHC was freezing my box (i7 930, 20GB memory)

This is a known problem, so can we do better? No suspense, the answer is “yes”. The same small benchmark also benches my shiny new buffer embedding module and gives the following results:

================================
| Benchmarking haskus-binary   |
================================
Warming up...
# Benchmarking size: 128b
real    0m1,489s
user    0m1,218s
sys     0m0,371s

# Benchmarking size: 3K
real    0m1,448s
user    0m1,179s
sys     0m0,336s

# Benchmarking size: 3M
real    0m1,477s
user    0m1,245s
sys     0m0,308s

# Benchmarking size: 15M
real    0m1,660s
user    0m1,304s
sys     0m0,409s

# Benchmarking size: 150M
real    0m3,163s
user    0m1,961s
sys     0m1,291s

# Benchmarking size: 1G
real    0m21,852s
user    0m7,123s
sys     0m7,054s

So it seems like we can. At this point I don’t suggest that you use my module as it is still a work in progress and as I don’t provide an easy way to create ByteString from the embedded file. Instead I will explain how the two approaches work and how file-embed implementation could be enhanced.

file-embed approach

When we want to embed a file with file-embed, the latter uses Template Haskell (TH) to create a literal string containing the file contents. GHC compiles literal strings into the equivalent of C arrays and we can get their addresses via TH: this is what file-embed does and it creates an expression calling unsafePackAddressLen with the appropriate address and size (see here).

Literal strings were not supposed to contain big chunks of binary data, hence it is not surprising that GHC chokes on them. #14741 tracks this issue and there is even a StaticData proposal (disclaimer: I am the author) suggesting that we add new things to GHC to improve on the situation.

My approach

Now that I’m re-reading my aforementionned StaticData proposal almost a year after writing it, the approach I use is basically what I have called “Step 2” in this document and it subsumes “Step 3”.

Suppose we want to include a file “data.bin”. We follow 3 steps:

  1. Via TH, we generate an assembly file containing something like:

It basically tells the assembler to include the “data.bin” file as-is into the output binary. We specify the section where to add it (“.data” or “.rodata” for read-only files), the alignment constraint, an optional file offset/size and finally we attach a global symbol to it.

  1. We include this file in the compilation chain.

We use a combination of addForeignFilePath, addTempFile and addDependentFile to add a dependency on the input file and the assembly file into the compilation chain.

There is a small issue: ForeignSrcLang doesn’t support assembly files yet. A workaround is to embed the assembly file into a C file by using GCC’s asm directive. It is suboptimal and it would be better to add direct support of assembly files to TH. I have opened a ticket (#16180) with the feature request and a merge request implementing it. I don’t know if/when it will be merged but we can use the workaround in the meantime.

  1. Via TH, we generate an expression to create a Buffer (or a ByteString) from the symbol in the assembly file.

We can get access to a global symbol with a foreign import declaration such as the following one:

To build a ByteString, we only have to generate something like the following:

And that’s it.

Conclusion

Using my approach is much faster as the benchmark results show. It’s because we totally bypass GHC’s pipeline: we don’t try to apply CSE to huge binary chunks; we don’t keep these huge binary chunks into memory; they don’t end up into interface files; etc.

It should be quite portable. We would have to adapt section names (“rodata”, “data”) on non ELF architectures I guess, but otherwise it should be OK.

As a bonus, with this approach we control the data alignment (useful with vector instructions) and we can choose to make the buffer read-only or not (if one tries to write a read-only buffer, it segfaults as expected).

You can find my implementation here. It is part of a larger work-in-progress on explicit memory management documented here (chapter on buffer embedding).

If someone is up to implement this into file-embed, I guess it will make a few people happy.