Abstract

Traditional debuggers are of limited value for modern scientific codes that manipulate large complex data structures. Current parallel machines make this even more complicated, because the data structure may be distributed across processors, making it difficult to view/interpret and validate its contents. Therefore, many applications’ developers resort to placing validation code directly in the source program. This paper discusses a novel debug-time assertion, called a “Statistical Assertion”, that allows using extracted statistics instead of raw data to reason about large data structures, therefore help locating coding defects. In this paper, we present the design and implementation of an ‘extendable’ statistical-framework which executes the assertion in parallel by exploiting the underlying parallel system. We illustrate the debugging technique with a molecular dynamics simulation. The performance is evaluated on a 20,000 processor Cray XE6 to show that it is useful for real-time debugging.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call