This document is a follow-up to my previous post on C++20 modules and Boost. Following the generated debates, we will cover:
-
A benchmark about compile-times when re-building code consuming modules.
-
Additional caveats when modularizing a library.
-
A mental model for Boost libraries with separate compilation.
-
A more realistic CMake approach.
Reading the previous post is strongly recommended. With the information in this post, we (the Boost author community) should have enough information to make an informed decision about module support in Boost.
Re-build time benefits
In the previous post, we benchmarked build times for an Asio-based server. We used the modular version of libc++
and a modular standalone Asio prototype to measure build times, varying the number of translation units. We observed very small benefits, increasing with the number of TUs.
These results included the time required to build Asio and the standard library modules. This is what you’d encounter when doing a clean build in a CI system. But what benefits would we see during local development, where the Asio and std
modules have already been built? Here are the results:
Number of re-built translation units | Build time (modules) | Build time (headers) |
---|---|---|
1 TU |
02.584s |
04.655s |
2 TU |
03.027s |
05.593s |
3 TU |
03.703s |
06.632s |
4 TU |
05.849s |
11.198s |
5 TU |
06.897s |
12.280s |
6 TU |
07.331s |
13.197s |
7 TU |
10.043s |
17.589s |
The modular version has considerable gains here, with a consistent ~45% reduction over the header version.
Benchmark conditions:
-
clang-19 (Linux)
-
Debug CMake build (note that the clean build measurements were performed in a Release build. Debug seems more appropriate to measure what happens during local development).
-
Building with 3 cores to measure the effects of parallelism.
-
Modular builds use Asio and the standard library as modules. The time to build such modules is not included in the benchmark.
-
The reported time includes the time elapsed by CMake re-scanning module dependencies for the affected translation units, which is usually not significative.
What if we hit build but had an error in a translation unit? The compiler won’t be able to perform any code generation, so the results are even better:
Build time (modules) | Build time (headers) |
---|---|
00.850s |
03.114s |
This means that errors are reported almost instantly. The measurement was taken by attempting to call an existing function with incorrect arguments at the end of the translation unit.
Takeaway: modules can make the local development feedback loop faster.
Modularizing a library requires adapting it
The promising export using
technique I talked about in my previous post is unfortunately limited to clang.
Under MSVC, mixing standard includes and imports is problematic. This is true even if the mix happens in different translation units, as long as they interact via import
. It shouldn’t be the case, according to the standard, but the fix may take time to arrive. For instance, this is problematic:
-
We modularize the Asio library non-intrusively, using the approach described above. As a result, the
asio
module consumes the standard library via#include
. -
A user consumes both
asio
and thestd
modules viaimport
.
This very much likely applies to our code, too - if we mix including and importing the same Boost libraries, we’re likely going to run into trouble.
Template specializations in namespace std
are also problematic. Have a look at the appendix if you’re curious.
With this in mind, Matt’s approach with Boost.Math becomes the only viable one. This can be summarized as:
-
Create a
BOOST_XXX_MODULE_EXPORT
macro, defined toexport
if we’re doing modules, empty otherwise. -
Annotate all public entities in our headers with the aforementioned macro.
-
Ifdef-out all third-party includes from our headers if we’re doing modules (including standard library and other Boost library includes) - these will be made available with
import
. -
Create a
module.cxx
source file that brings all headers in.
See Matt’s Boost.Math work for an example.
Takeaway 1: modularizing a library requires some changes and extra maintenance effort.
Takeaway 2: we should consume std and other Boost libraries via import
.
Exporting macros: a necessary evil
Modules don’t export macros. You need to use traditional include files to do that. Although the C++ world tries to flee from macros, the reality is that these are necessary at certain points. Concretely, in Boost’s case:
-
Libraries like Boost.Assert or Boost.Config are almost just macros. I propose not creating any modules for these, but just including them in the global module fragment, as you’d do with
<assert.h>
. Some work is required toifdef
-out standard includes when present. -
Some libraries expose macros indicating the presence of platform-specific features. For instance, the
asio::local::stream_protocol
class is only present ifASIO_HAS_LOCAL_SOCKETS
is defined.if constexpr
is probably not suitable for this task (see Godbolt 1 and Godbolt 2). We’d need to refactorconfig.hpp
headers for such libraries to provide the relevant macros. -
Other libraries expose a combination of macros and types, like Boost.Test. Non-trivial refactoring is required to make these work.
Note that using #include <version>
looks compatible with import std
on all platforms.
Takeaway: we can’t just ignore macros. Libraries need to export these as required
Compiled libraries
We have two approaches for compiled libraries:
-
Adapt their implementation (
.cpp
files) so they are conditionally built and consumed using modules. -
Keep their implementation files as they are, and provide modular code for the interface (as we’d do with header-only libraries).
The second approach seems the most suitable one for us, since it’d make our modules compatible with the binary libraries we generate today. Our libraries would be built using b2
as they are today, and would also be importable.
How does export
interact with __declspec(dllexport)
and similar constructs? The following mental model may be useful:
-
Think of
export
a construct for the compiler, affecting declarations and definitions. -
Think of
__declspec(dllexport)
and friends as constructs for the linker, affecting symbols in object files.
In a compiled library, you’d:
-
Mark as both
export
and__declspec(dllexport)
compiled functions that should be visible by importers. -
Mark as
__declspec(dllexport)
compiled functions that are considered implementation details, and are not to be called by the end user. -
Mark as
export
inline functions and templates that may be called by the end user.
Modularizing compiled libraries is almost identical to doing so for header-only libraries (with some details). I’ve done a clang-based proof-of-concept with Boost.Charconv.
Note that this seems to work fine even if the library includes the standard library in its implementation. This makes sense because the library’s translation units are not seen by the compiler when importing, but only by the linker.
Takeaway: we can probably treat compiled libraries in a similar way as header-only ones, regarding modular consumption.
CMake: the tricky part
As I mentioned in my previous post, we need to provide a way for our users to consume our modules, probably using CMake. The simple approach I proposed in my previous post falls short for modules with many dependencies. So let’s consider our options.
I’ve reached the CMake team for help. Their recommended method is:
-
As part of the Boost build, create libraries with the module code. This is, call
add_library
andtarget_sources
once per Boost library. -
Install the generated libraries, include files, and module files following the usual CMake install practice. This ends up creating packages the user can consume via
find_package
, resulting inIMPORTED
targets with some special properties signaling the presence of modules (seeIMPORTED_CXX_MODULES_COMPILE_DEFINITIONS
). -
The consumer calls
find_package
and consumes the module viatarget_link_libraries
. This generates BMIs in the consumer’s project for the required modules and their dependencies.
While this seems the approach to follow, there are a number of caveats:
-
The
add_library
calls end up generating actual binary libraries, even when the original library was header-only. Why does this happen?-
When building a module translation unit, the compiler generates a BMI and an object file.
-
The object file contains initialization code required by the module (module initializer symbols). For instance, Asio requires initializing its error categories.
-
When CMake "builds a module" in the consumer’s project, it builds the BMI, and not the object files.
-
This means we now have
libboost_asio.a
,libboost_beast.a
and all others, which get installed to the user’s machine and linked into the user’s final executable, which can create further compatibility problems.
-
-
In the consumer’s project, BMIs are built just once and use the original project’s compiler options, not the user’s.
-
This falls into similar limitations as distributing the BMI itself.
-
The CMake team is working on enhancing this to build BMIs according to the consumer’s target settings, but there is not an ETA for it.
-
-
There is no clean way for consumers to define configuration macros affecting the library (like
ASIO_DISABLE_THREADS
). Such macros may affect initialization code contained in the newly created libraries (which doesn’t get rebuilt in the consumer), which leads to problems. -
Using this requires us to generate releases with CMake, rather than b2.
As an alternative, we can consider rolling our own CMake machinery, extending what I proposed in my previous post, following the mantra "all module code gets built by the user".
Takeaway: CMake support is the most difficult part in this story. It requires either assuming big limitations or writing a considerable amount of CMake code.
Testing implications
If we are to provide module code for Boost, we need to test its correctness before shipping it to users. Having a module build without errors is probably not enough. For instance, MSVC has a bug causing problems with SEH try
/except
constructs.
The Asio module, which uses the aforementioned constructs, builds with a warning. The problem manifests only in the module’s consumer, when using functionality that calls into code that uses SEH, in the form of a cryptic build error. Forgetting to export functions is another potential issue that won’t be noticed while building.
The most reliable way to test is adapting the library’s test suite to conditionally use modules, in a similar way as we’d be adapting headers. This extra work should be able to detect most real problems.
We’d also need a flow to verify that the generated CMake files can be consumed correctly, in a similar fashion to the "CMake consumer tests" we currently run in most libraries.
Takeaway: modularizing requires additional work regarding testing.
Moving forward
The next step would be to build a more realistic proof-of-concept that demonstrates the modularization process end-to-end. This includes modularizing a library, its dependencies, its tests, and writing any required CMake/b2 code. I think Boost.Url is a good candidate, as it’s compiled, has some header-only dependencies, and is actively maintained.
But before this, we (Boost maintainers) should make a decision, as a community: do we want modules? do we want to assume the extra cost of supporting modules in our codebase? We roughly know the benefits and downsides. It doesn’t make sense to go forward if module-related PRs are going to be rejected.
Boost users - we’d also like to hear from you. Would you use our modules in your codebase if we decide to provide them?
Appendix: template specializations
Consider this piece of code in the Asio library:
namespace std {
template <> struct is_error_code_enum<asio::error::basic_errors>
{
static const bool value = true;
};
}
If we attempt a non-intrusive approach like the following:
// asio.cxx - defines how to build Asio as a module so it can be imported
module;
#include <asio.hpp>
export module asio;
namespace asio::error {
export using asio::error::basic_errors;
// ...
}
What does the following client code see?
import asio;
import std;
static_assert(std::is_error_code_enum<asio::error::basic_errors>::value);
-
Under clang, the assertion succeeds. The
import asio
statement makes the template specialization reachable. -
Under MSVC, the assertion fails. Apparently, the specialization is not considered to be decl-reachable from the module purview and thus discarded.
As pointed out by @opensdh, this seems to be a defect in the standard, so we can expect this behavior to change in the future, possibly matching clang’s. See this GitHub issue for @opensdh’s explanation.