Abstract

This paper presents a scalable design and implementation of the molecular docking application DOCK for a large-scale high performance computing system, the Sunway TaihuLight supercomputer, which provisions a heterogeneous, manycore processor architecture that consists of management processing elements (MPEs) and clusters of computing processing elements (CPEs). The key innovation is a novel refactoring of DOCK on the CPEs. Optimization techniques for data redundancy minimization to fit data in cache, software-controlled prefetching into scratchpads, memory access coalescing, software caches, vectorization and loop unrolling are employed to improve the exploitation of the computational resources. For a single docking process, the refactored version using both the MPE and CPE cluster achieved 260x to 402x speedup compared against the original ported version using MPE only. To scale the DOCK to the full Sunway Taihulight system with 10,649,600 cores (including all MPE and CPE cores), we present an MPI communication domain partition scheme as well. For docking 9 million small compounds to a Zika virus target protein, we manage to scale to 131,072 MPEs, and 8,388,608 CPEs, with a total of 8,519,680 cores.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call