Abstract

The rich history of scalable computing research owes much to a rapid rise in computing platform scale in terms of size and speed. As platforms evolve, so must algorithms and the software expressions of those algorithms. Unbridled growth in scale inevitably leads to complexity. This special issue grapples with two facets of this complexity: scalable execution and scalable development. The former results from efficient programming of novel hardware with increasing numbers of processing units (e.g., cores, processors, threads or processes). The latter results from efficient development of robust, flexible software with increasing numbers of programming units (e.g., procedures, classes, components or developers). The progression in the above two parenthetical lists goes from the lowest levels of abstraction (hardware) to the highest (people). This issue's theme encompasses this entire spectrum. The lead author of each article resides in the Scalable Computing Research and Development Department at Sandia National Laboratories in Livermore, CA. Their co-authors hail from other parts of Sandia, other national laboratories and academia. Their research sponsors include several programs within the Department of Energy's Office of Advanced Scientific Computing Research and its National Nuclear Security Administration, along with Sandia's Laboratory Directed Research and Development program and the Office ofmore » Naval Research. The breadth of interests of these authors and their customers reflects in the breadth of applications this issue covers. This article demonstrates how to obtain scalable execution on the increasingly dominant high-performance computing platform: a Linux cluster with multicore chips. The authors describe how deep memory hierarchies necessitate reducing communication overhead by using threads to exploit shared register and cache memory. On a matrix-matrix multiplication problem, they achieve up to 96% parallel efficiency with a three-part strategy: intra-node multithreading, non-blocking inter-node message passing, and a dedicated communications thread to facilitate concurrent communications and computations. On a quantum chemistry problem, they spawn multiple computation threads and communication threads on each node and use one-sided communications between nodes to minimize wait times. They reduce software complexity by evolving a multi-threaded factory pattern in C++ from a working, message-passing program in C.« less

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.