This page contains a set of concepts and interesting findings I’ve discovered during my investigation about C++20 modules.

Expect this page to change a lot over these days as I learn more!

IDE support

Clangd-based IDEs like VSCode and CLion provide experimental support for C++20 modules. However, the tooling has notable limitations. Binary Module Interfaces (BMIs) are generated on demand, and the integration with build systems remains fragile. This means that while basic use cases will work, you may encounter issues in more complex scenarios. For a detailed discussion on the current state of clangd support, see [clangd-modules].

On dependency scanning

The idea of dependency scanning is: given a set of translation units, find out:

  • What modules does each TU provide? There is no linking between file names and provided modules, so this step is required.

  • What modules and header units does each TU require?

This information is required by build systems to build the dependency graph, and by tools like clangd, which require having available BMIs to do their task. This is what clang-scan-deps is for.

The tricky part is that scanning requires a fully capable preprocessor, because you can have an import in an include, or guarded by an ifdef.

How to generate BMIs

You may need to generate a BMI for a library that you didn’t build, because BMI compatibility is stricter than ABI compatibility. For instance, TUs with different C++ standard level tend to mix well at the ABI-level, but BMIs with different C++ standard level are incompatible.

For this reason, build systems should be able to generate BMIs for libraries built by other build systems. This is currently not the case, and requires adopting metadata formats that communicate enough information to achieve this. Currently under study by SG15.

A key concept here is local preprocessor arguments: these are the arguments required to produce a BMI, but are not required to be consistent across other TUs. If our build system knew these, it could compute the command line to generate a BMI like this:

BMI command line = importer command line - importer local args + importee local args

Example: -DBOOST_ASIO_CONCURRENCY_HINT=1 would be a local preprocessor argument.

TBC: this is not functional as of today yet. When do we expect it?

Header units

They were supposed to make the transition to modules easier, but their implementability is in question. They’re currently unsupported by CMake, and there seems to be no activity on the issue as of the time of writing [cmake-header-units].

They are currently supported by some compilers, but you need to use the command line. For example, with clang, if you specify a -fmodule-file=iostream.pcm, you can import <iostream>;.

They are complex from a build system perspective: computing the dependency graph depends on the preprocessor state, but importing a header can modify preprocessor state. Build performance is yet to be known (see [challenges-header-units]).

They might be implementable after agreeing on a metadata format that allows specifying which headers are importable, their local preprocessor arguments, and so on.

Include translation

The standard allows certain includes to be automatically translated into imports. That is, #include <iostream> may be translated into import <iostream>;. Note that this NEVER translates to import std, as modules don’t export macros, so this would break in the case of headers like <cerrno> or <cassert>.

This is supported in:

  • clang: when including standard library headers in the GMF, and only when BMIs for these headers are built and made available explicitly.

  • MSVC: this must be explicitly enabled with /translateIncludes, and a MSVC-specific JSON file with metadata is required.

All in all, this is currently neither clean nor transparent.

P3041R0 proposes specifying to the compiler somehow that a header is subsumed by either a named module or another header. This would be a piece of metadata authored by the library author, who knows the specifics of a library. For instance:

  • <iostream> is fully subsumed by the std named module.

  • <cerrno> is subsumed by the std named module, plus some macro definitions.

  • <boost/mp11.hpp> is subsumed by the boost.mp11 module, plus a macro definition for its BOOST_MP11_VERSION macro.

All in all, I think this is the transition we need. The library author is the person that has enough information to determine this translation.

At the moment, I’m trying to emulate this in Boost headers with some macro machinery:

// Replace the whole header content by an import if the user
// is doing modules
#if !defined(BOOST_XYZ_SOURCE) && defined(BOOST_USE_MODULES)
import boost.xyz;
#else

// Regular header

#endif

export using and decl-reachability

As you may know, modules introduce the concept of decl-reachability. The idea is that entities in the GMF are discarded by the compiler (and not included in the BMI) if they are not decl-reachable from the entities that are exported from the module.

This usually translates into "entities introduced by headers that are not used within the module are discarded", which sounds logical. However, you might find surprises. I found this problem with a class that implements the tuple protocol:

// mylib.hpp
#include <cstddef>
#include <tuple>
#include <type_traits>

namespace mylib {

class Result {
  int value_ {};
public:
  Result() = default;
  int& getValue() & { return value_; }
  const int& getValue() const& { return value_; }
  int&& getValue() && { return std::move(value_); }
};

// Required by the tuple protocol
template <std::size_t I>
int& get(Result& r) { return r.getValue(); }

template <std::size_t I>
const int& get(const Result& r) { return r.getValue(); }

template <std::size_t I>
int&& get(Result&& r) { return std::move(r).getValue(); }

}

// Required by the tuple protocol
template <>
struct std::tuple_size<mylib::Result> : std::integral_constant<std::size_t, 1u> {};

template <>
struct std::tuple_element<0u, mylib::Result>  { using type = int; };




// lib.cppm
module;

#include "mylib.hpp"
#include <array>
#include <tuple>

export module mylib;

export namespace mylib {

// I had expected that exporting Result would be enough
using mylib::Result;

// But you need these two lines for things to work
using mylib::get;
inline void dont_discard_result(std::array<int, std::tuple_size_v<Result>>) {}

}



// main.cpp
import mylib;

int main()
{
    // Works only with the two lines added in lib.cppm
    auto [v] = mylib::Result{};
}

The problem here is that neither get nor the std::tuple_size specialization seem to be decl-reachable from the module purview unless you add the two lines highlighted above.

If you don’t add these lines, the module builds but main.cpp errors. This also shows that, when adding module support for a library, you need to run your entire test suite using import. Otherwise, chances are that you’re just shipping buggy code.

All in all, I’m leaning towards thinking that the ABI-breaking style is the only robust way of modularizing a library.

Modules and the ODR

We all know what the ODR means in C++: an entity can only have a single definition. Entities like inline functions might have multiple definitions across TUs, as long as they are the same. The problem with this rule is that the standard doesn’t mandate a diagnostic if you violate it, resulting in difficult-to-diagnose bugs.

The main way modules help here is by not performing textual include anymore (structural ODR safety). They help in more ways, too, but we need to introduce some concepts first.

In the modules world, every entity is attached to exactly one module. The global module is where entities that don’t belong anywhere else are attached (e.g. think of entities poured into the GMF by including <Windows.h>).

Entities attached to a named module can only have a single definition in a single TU. This applies to all entities, including things like inline functions. The compiler is required to diagnose when we break this rule if we end up making the definitions reachable to each other (i.e., importing them into a common file). Not perfect, but much better than what we had before:

// mylib.h
inline int f() {
  #ifdef NDEBUG
  return 42;
  #else
  return 10;
  #endif
}

// mylib.cppm (compiled with -DNDEBUG)
export module mylib;
#include "mylib.h" // f becomes attached to module mylib

// other.cpp (compiled without -DNDEBUG)
import mylib;      // OK
#include "mylib.h" // ODR violation, diagnosed

Attachment works roughly like this:

  • Declared in the GMF: attached to the global module.

  • Declared in a TU that is not a module: attached to the global module.

  • Declared in an extern "C++" block: attached to the global module.

  • Declared in the purview of a named module: attached to that module.

The ABI-breaking style attaches entities to modules, while export using doesn’t (uses the global module).

Module linkage

Module linkage is a new type of linkage. If "internal linkage" means "TU local", module linkage means "module local". The rule is ([basic.link-4.10]): if an entity is attached to a named module and not exported, it has module linkage.

So module linkage per se doesn’t have any implications regarding how modules help with ODR violations. It is an implication of module attachment. Note that entities with module linkage are mangled differently than other entities.

main in a module

At the time of writing, main can’t be placed in a module. [basic.start.main] explicitly forbids attaching main to any named module and enclosing it in an extern "C++" block, which were the two possibilities.

This is problematic for unit tests that need to cover library implementation details. For instance:

// mylib.cppm
export module mylib;
int f_impl() { return 42; } // not exported
export int f() { return f_impl(); }

If we want to test f_impl, we need to make the unit test part of module mylib. But main can’t be part of module mylib, so a workaround is needed. More on this next week.

[P3618R0] proposes lifting the restriction regarding the linkage block, which would solve the problem. The paper seems to have made progress, but isn’t in the standard draft yet.

The std module

Simple: import std brings the entire standard library in. std.compat brings names like printf into the global scope.

As you know, modules don’t export macros, so if you need any macros, you need to include the relevant header. This is a list of headers that bring macros in:

#include <cassert>      // assert
#include <cerrno>       // errno, all E* errno macros
#include <cstddef>      // NULL, offsetof
#include <cstdarg>      // va_arg, va_copy, va_end, va_start
#include <cmath>        // HUGE_VAL*, INFINITY, NAN, FP_*, MATH_*, math_errhandling
#include <version>      // __cplusplus, __cpp_lib_*, __cpp_*, etc.
#include <climits>      // integer limit macros (same as C)
#include <cfloat>       // floating-point limit/characteristic macros (same as C)
#include <cstdint>      // fixed-width limit and optional format macros (same as C)
#include <cinttypes>    // format/scan macros (same as C)
#include <csignal>      // SIG_* macros (same as C)
#include <csetjmp>      // setjmp (and any other C macros from <setjmp.h>)
#include <ctime>        // time/clocks macros (same as C)
#include <clocale>      // LC_* and other C locale macros

The standard dictates that users are free to mix import std with standard includes, and this should work. The reality is that most implementations break when doing this.

Including first and then importing usually works fine, but the other way around causes trouble:

import std;
#include <cmath> // causes redefinition errors in gcc-15 and clang-22

These errors happen even if the include happens in a different TU, as long as one TU is reachable from the other:

// mylib.cppm
module;
#include <cmath>

export module mylib;

// main.cpp
import std;
import mylib; // error! we've made mylib.cppm reachable from main.cpp, and it contains an include

This currently poses a lot of problems, which is unfortunate, since the compile time benefits that import std provides are massive. My current view is that there are two possibilities here:

  • Ifdef-out standard headers when a library is being consumed as a named module, and mandate import std everywhere.

  • Find a way of making the approach in [p3041r0] work.

References