Abstract

Distributed and concurrent applications often have subtle bugs that only get exposed under specific schedules. While these schedules may be found by systematic model checking techniques, in practice, model checkers do not scale to large systems. On the other hand, naive random exploration techniques often require a very large number of runs to find the specific interactions needed to expose a bug. In recent years, several random testing algorithms have been proposed that, on the one hand, exploit state-space reduction strategies from model checking and, on the other, provide guarantees on the probability of hitting bugs of certain kinds. These existing techniques exploit two orthogonal strategies to reduce the state space: partial-order reduction and bug depth. Testing algorithms based on partial order techniques, such as RAPOS or POS, ensure non-redundant exploration of independent interleavings among system events by imposing an equivalence relation on schedules and ideally exploring only one schedule from each equivalence class. Techniques based on bug depth, such as PCT, exploit the empirical observation that many bugs are exposed by the clever scheduling of a small number of key events. They bias the sample space of schedules to only cover all executions of small depth, rather than the much larger space of all schedules. At this point, there is no random testing algorithm that combines the power of both approaches. In this paper, we provide such an algorithm. Our algorithm, trace-aware PCT (taPCTCP), extends and unifies several different algorithms in the random testing literature. It samples the space of low-depth executions by constructing a schedule online, while taking dependencies among events into account. Moreover, the algorithm comes with a theoretical guarantee on the probability of sampling a trace of low depth---the probability grows exponentially with the depth but only polynomially with the number of racy events explored. We further show that the guarantee is optimal among a large class of techniques. We empirically compare our algorithm with several state-of-the-art random testing approaches for concurrent software on two large-scale distributed systems, Zookeeper and Cassandra, and show that our approach is effective in uncovering subtle bugs and usually outperforms related random testing algorithms.

Highlights

  • We consider the problem of systematically testing distributed message-passing programs

  • A number of testing algorithms have explored systematic randomization as an effective way to sample from large state spaces [Burckhardt et al 2010; Kulahcioglu Ozkan et al 2018; Majumdar and Niksic 2018; Nagarakatte et al 2012; Sen 2007; Yuan et al 2018]

  • The concurrency bugs in such distributed systems are known to be łdeep,ž i.e., triggering the bug requires some certain ordering of a large number of messages [Leesatapornwongsa et al 2016]

Read more

Summary

Introduction

We consider the problem of systematically testing distributed message-passing programs. The space of interleavings grows exponentially with the length of execution; a key challenge in systematic testing is to control the set of possible interleavings explored, while still guaranteeing that many bugs will be found. Formal techniques, such as model checking, provide different heuristics to reduce the state space and are complete in the limit. They seldom finish exploring all schedules of a large concurrent system within reasonable testing budgets. Two effective approaches in this vein have been trace-aware random testing and depth-bounded random testing

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.