Abstract
High energy physics experiments traditionally have large software codebases primarily written in C++ and the LHCb physics software stack is no exception. Compiling from scratch can easily take 5 hours or more for the full stack even on an 8-core VM. In a development workflow, incremental builds often do not significantly speed up compilation because even just a change of the modification time of a widely used header leads to many compiler and linker invokations. Using powerful shared servers is not practical as users have no control and maintenance is an issue. Even though support for building partial checkouts on top of published project versions exists, by far the most practical development workflow involves full project checkouts because of off-the-shelf tool support (git, intellisense, etc.) This paper details a deployment of distcc, a distributed compilation server, on opportunistic resources such as development machines. The best performance operation mode is achieved when preprocessing remotely and profiting from the shared CernVM File System. A 10 (30) fold speedup of elapsed (real) time is achieved when compiling Gaudi, the base of the LHCb stack, when comparing local compilation on a 4 core VM to remote compilation on 80 cores, where the bottleneck becomes non-distributed work such as linking. Compilation results are cached locally using ccache, allowing for even faster rebuilding. A recent distributed memcached-based shared cache is tested as well as a more modern distributed system by Mozilla, sccache, backed by S3 storage. These allow for global sharing of compilation work, which can speed up both central CI builds and local development builds. Finally, we explore remote caching and execution services based on Bazel, and how they apply to Gaudi-based software for distributing not only compilation but also linking and even testing.
Highlights
The LHCb physics software stack builds on top of Gaudi [1] and is comprised of a number of interdependent projects, some of which provide libraries while others define applications
An exception is made for system headers, which are expected to be the same on the server. This rather stringent requirement is ensured in our case by LCG releases being distributed by the globally shared CernVM File System (CVMFS) and by installing the common operating system dependencies using the HEP_OSlibs meta package
Similar gains are seen across the LHCb software stack. Such developments have the potential to speed up automated builds, typically constrained on limited-resource virtual machines (VMs), where timely feedback is essential
Summary
The LHCb physics software stack builds on top of Gaudi [1] and is comprised of a number of interdependent projects, some of which provide libraries while others define applications (see Figure 1). A more complete example is the LHCb trigger application, called Moore, which together with all dependent projects (up to and including Gaudi) defines about 4800 targets, out of which 3500 are object files. It is worth noting that there are very few dependencies that limit the concurrency of building those object files, which means that the bulk of the build can be parallelised if many cores are available. Developing on such a codebase can prove difficult due to the amount of resources required for building. The build step is timed, which takes about 13 min using six simultaneous jobs
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have