We explore the use of distributed processing to enhance the performance of explicit state enumeration based safety model-checking. State enumeration based model-checkers employ a hash-table to cut off search when a state is revisited. Distributed model-checkers distribute this table across the processing nodes, employing inter-node messages to perform state lookups. This approach incurs the following penalties: hashing states, looking up hash-tables, and possibly exchanging messages. In this paper, we study how to avoid these penalties in the context of safety model-checking, assuming that completeness can be sacrificed (acceptable for quick error detection). We employ the basic strategy of distributed random walk - a process of multiple processors randomly, and in an uncoordinated fashion, moving through the state-space looking for safety violations, without recording visited states. This process has the potential of maximizing CPU utilization, and consequently greatly increase the rate of state generation, as the pressure on the memory system as well as communication network are minimal. Moreover, the probability that a randomwalk repeats the same sequence of moves can decrease exponentially with the length of the sequence; thus, the work wasted by occasionally repeating short sequences of searches may be more than offset by the increased state generation rate. Our choices are ideal for distributed systems that have low amounts of memory per node, and are interconnected by low bandwidth networks. We also explore techniques that backoff slightly from our extremal choices, by exploring heuristic combinations of breadth-first search (BFS) and random-walk (RW) that require a modest amount of hash-table lookup and message exchanges. These search methods are natural to combine, since BFS requires higher amounts of memory to maintain queues, but guarantees to find the shortest path to a state, while RW has the opposite characteristics. In this paper, we first study these heuristic methods on synthetic benchmarks to gain sharper (more quantifiable) insights. We then conduct studies on some realistic examples as well. We employ up to 10 single-processor CPUs that happen to be connected via 100BASE-T Ethernets. Our code was easily ported to other platforms, thanks to our use of the popular MPI distributed programming library.
Read full abstract