C++20 modules and Boost: an analysis

This document contains my findings when playing with C++20 modules and the possibility of a modular Boost. The analysis is oriented towards analyzing the costs and benefits of building a distributing existing Boost libraries as C++20 modules, so that users can write import boost in their code. The analysis focuses on header-only libraries.

The experiment is available in this repository.

Update: a follow-up post is now available.

A mental model for modules

Quick recap: modules are a C++20 language-level construct that aims to provide better encapsulation and reduce build times. Modules add a new translation unit type: module units. Modules are built into an artifact called BMI.

In our context, as an overly simplification, we can think of them as "well behaved precompiled headers", with the following differences:

You can import as many modules as you want, while you can only include one precompiled header.
Declarations are not exported from modules by default - export needs to be used on the declarations to be exported.
Macros can’t be exported from modules.

Like PCHs, BMIs are highly non-portable and must be built by the library user (and not by us), and need some cooperation from the build system. For header-only libraries, this is the user’s build system (e.g. CMake), not the one we use (b2).

For each import the user places in their TU, the compiler needs a corresponding BMI. This introduces a build-time dependency (like precompiled headers do). The build system needs to scan TUs for imports to build the dependency graph.

Header units (import <boost/asio.hpp>;) are supposed to be an intermediate between headers and modules. I haven’t explored them because they don’t have CMake support, and are supposedly slower.

Compiler and tooling support

At the time of writing this article, support for modules is in its early stages. Concretely, when using CMake, the following requirements apply:

CMake 3.28 (module stable support has been out for 2 releases).
MSVC 14.34, clang-16, or gcc-14 (this hasn’t been released yet, but support has already been merged).
When building for UNIX, Ninja 1.11.

Note that build2 has support for modules and even header units (where CMake hasn’t). As we only ship CMake files, I haven’t investigated build2 further.

C++23 enables import std, which is supposed to improve compile-times. This is currently less developed:

MSVC standard library and libc++ ship with standard modules, while libstdc++ doesn’t support them yet.
In both cases, the library ships with the module source code, and the user needs to build the modules themselves.
There is intention from CMake to add support for building standard modules in an easy way, but there is nothing yet. Manual approaches are possible for testing but not adequate for production.

No IDE supports modules yet (clangd-19 enters an infinite loop when it sees an import), which disables autocompletion and highlighting.

This means that this feature is very unlikely to be used in production right now. We may encounter early adopters and pet projects, but not many industry users yet.

How to modularize a library

Before measuring compile times, we need a familiar library to be consumed as a module. While Matt Borland and John Maddock have done a great job writing a modular version of Boost.Math, I don’t have realistic, slow-to-compile code to perform adequate benchmarks. So I’ve gone ahead and modularized standalone Asio.

While Math’s approach works, it’s intrusive and requires a lot of work. An easier approach (employed by libc++) is exporting the required names with using declarations:

// File asio.cxx. Defines how to build Asio as a module so it can be imported
module;

#include <asio.hpp>

export module asio;

namespace asio {
export using asio::io_context;
export using asio::post;
export using asio::any_io_executor;
// ...
}

There are a number of caveats with this approach though:

constexpr variables need to be inline to be exported. This is not the case in most libraries, although it should be. This requires submitting PRs to libraries. As mentioned by @opensdh, there is a proposal to make this easier.
Some libraries define template specializations in other namespaces, like std. With the current language rules, these cause trouble when employing export using. See this section on the follow-up article for more info about this topic.
Making a definition available using import in some TUs and #include in others seems to work when employing the export using technique, but may cause ODR violations otherwise, as pointed by Peter Dimov. This is troublesome considering the above point.

Measuring build-time benefits

The experiment involves building a simple, async, coroutine-based server that listens for connections, reads and writes using SSL.

As with PCHs, only executables with several translation units including similar headers may see benefits. The benchmark involves adding translation units, all of them building the same server, with and without modules.

Benchmark conditions:

clang-19 (Linux)
Release CMake build
Building with 3 cores to measure the effects of parallelism.
Modular builds use Asio and the standard library as modules. The time to build such modules is included in the benchmark.

Number of translation units	Build time (modules)	Build time (headers)
1 TU	09.124	06.909
2 TU	10.708	07.446
3 TU	12.280	09.773
4 TU	14.786	16.057
5 TU	16.065	16.631
6 TU	16.374	17.972
7 TU	20.966	24.695

Benefits are not as big as expected. Compiling with -ftime-trace with modules shows the following:

The slower to build artifacts are the std module, the Asio module and the server TUs.
The std and asio modules build in parallel (Asio uses includes for std). The server TUs require the module objects and won’t start building until the former are ready.
Each of the two modules take around 4s to build. This is spent including headers and parsing declarations.
Building server TUs take 6s in total: 2s in the compiler’s frontend (performing instantiations) and 4s in the backend (performing optimizations).
The header version takes 9s. 3s are spent parsing headers, which is not present in the module version.
Rebuilds (as happen during local development) are significantly faster in the module version - see my follow-up post for details.

Although non-zero, I find the gains slightly disappointing. These may be bigger for bigger projects, debug builds or different libraries. The benefits on re-builds may be enough for some users to consider modules, though.

Consuming Boost using modules

If we write module code for some Boost libraries, we need to ship the code and provide users with a way to build and consume it. As we ship CMake bindings with our libraries, the obvious path is to enhance this to include building Boost modules.

This is what the end user’s CMake could look like:

# Same as today
find_package(Boost REQUIRED)

# A function defined by find_package(Boost). Builds the Boost.Asio module into a target named asio_module
add_boost_asio_module(asio_module)
# Possibly set compile flags required by dependent targets

# Use the module
add_executable(server main.cpp)
target_link_libraries(server PRIVATE asio_module)

This resembles the pch rule in B2. Under the hood, the function creates a library target that builds the corresponding Boost module. For instance:

function (add_boost_asio_module NAME)
    set(ROOT @CMAKE_INSTALL_PREFIX@)
    add_library(${NAME})
    target_include_directories(${NAME} PRIVATE ${ROOT}/include)
    target_compile_features(${NAME} PUBLIC cxx_std_23)
    target_sources(${NAME} PUBLIC
        FILE_SET modules_public TYPE CXX_MODULES FILES
            ${ROOT}/module/asio.cxx
    )
endfunction()

A function may be more appropriate than an actual target because the module may need to be built several times, with different flags and definitions.

Such an approach requires non-trivial changes in either Boost.CMake or boost_install. Note that vcpkg users would not be able to access this, since vcpkg does not use the official Boost CMake modules. conan and system package managers would benefit.

Conclusion

Modules are in a very early stage yet. We won’t get lots of production users with this.
A "module-only" Boost2 is probably not a good idea at this point.
Modules may provide some compilation speed-up, but they’re not a panacea. Instantiation time isn’t affected by modules. You’re not wasting your time making your libraries less header-only.
Providing modular "bindings" for some Boost libraries may be interesting to gain some real-world experience from early adopters.