Abstract

Read–copy update (RCU) is a synchronization mechanism used heavily in key components of the Linux kernel, such as the virtual filesystem (VFS), to achieve scalability by exploiting RCU’s ability to allow concurrent reads and updates. RCU’s design is non-trivial, requires a significant effort to fully understand it, let alone become convinced that its implementation is faithful to its specification and provides its claimed properties. The fact that as time goes by Linux kernels are becoming increasingly more complex and are employed in machines with more and more cores and weak memory does not make the situation any easier. This article presents an approach to systematically test the code of the main implementation of RCU used in the Linux kernel (Tree RCU) for concurrency errors, both under sequentially consistent and weak memory. Our modeling allows Nidhugg, a stateless model checking tool, to reproduce, within seconds, safety and liveness bugs that have been reported for RCU. Additionally, we present the real cause behind some failures that have been observed in production systems in the past. More importantly, we were able to verify both the publish–subscribe and the grace-period guarantee, with the latter being the basic and most important guarantee that RCU offers, on several Linux kernel versions, for particular configurations. Our approach is effective, both in dealing with the increased complexity of recent Linux kernels and in terms of time that the process requires. We hold that our effort constitutes a good first step toward making tools such as Nidhugg part of the standard testing infrastructure of the Linux kernel.

Highlights

  • The Linux kernel is used in a surprisingly large number of devices: from PCs and servers to routers and smart TVs

  • This article reports on the use of stateless model checking for testing the core of Tiny Read–copy update (RCU) and Tree RCU, both being RCU implementations used in the Linux kernel

  • Our effort concentrated on particular kernel configurations, but we investigated the effects that weak memory models (TSO and PSO) may have on RCU’s operation

Read more

Summary

Introduction

The Linux kernel is used in a surprisingly large number of devices: from PCs and servers to routers and smart TVs. This article reports on the use of stateless model checking ( known as systematic concurrency testing) for testing the core of Tiny RCU and Tree RCU, both being RCU implementations used in the Linux kernel. Using this model, as well as the source code from five different kernel versions directly, we verified both a part of the publish–subscribe guarantee We were able to demonstrate that a submitted patch, intended to impose a locking design, in reality fixed a much more serious bug that was responsible for failures observed in production systems some years back, a fact that was previously unknown We report on this issue and present the exact conditions under which this bug occurs In non-preemptible kernels, which are the ones we focus on this work, RCU imposes zero overhead to readers

How RCU works
RCU specifications
Stateless model checking
Stateless model checking Tiny RCU
Tiny RCU implementation
Kernel environment modeling
Results
Tree RCU implementation
High-level explanation
Data structures
Use cases
Registering a callback
Passing through a quiescent state
Reporting a quiescent state to RCU
Interrupts and dynticks-idle mode
Forcing quiescent states
Modeling an SMP platform
Kernel definitions
Synchronization mechanisms
Verifying the publish–subscribe guarantee
Verifying the grace-period guarantee
Test configuration
Test runs
Results and discussion
Presenting the cause of an older kernel bug
10.2 Lessons learned
11 Related work
12 Concluding remarks
Compliance with ethical standards
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call