Abstract

Enforcing the orthogonality of approximate wavefunctions becomes one of the dominant computational kernels in planewave based Density Functional Theory electronic structure calculations that involve thousands of atoms. In this context, algorithms that enjoy both excellent scalability and single processor performance properties are much needed. In this paper we present block versions of the Gram–Schmidt method and we show that they are excellent candidates for our purposes. We compare the new approach with the state of the art practice in planewave based calculations and find that it has much to offer, especially when applied on massively parallel supercomputers such as the IBM Blue Gene/P Supercomputer. The new method achieves excellent sustained performance that surpasses 73 TFLOPS (67% of peak) on 8 Blue Gene/P 1 racks (32 768 compute cores), while it enables more than a two fold decrease in run time when compared with the best competing methodology.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.