Abstract

For current supercomputer systems, multicore and multisocket processors are required in order to build a system, and choice of interconnection is essential. In addition, for effective development of new code, high-performance, scalable, and reliable numerical software is key. ScaLAPACK and PETSc are software developed for distributed memory parallel computer systems. Real computation requires software that is highly tuned for implementation on new architectures, such as many-core processors. In the present study, we introduce a high-performance, highly scalable eigenvalue solver with the goal of realizing the K-computer system, which is a next-generation supercomputer system. We have developed two versions of this eigenvalue solver, namely, the standard version (eigen_s) and an enhanced-performance version (eigen_sx), both of which were developed on the T2K cluster system housed at the University of Tokyo. Eigen_s uses conventional algorithms, such as Householder tridiagonalization, the divide and conquer (DC) algorithm, and the Householder backtransformation. These algorithms are carefully implemented using a blocking technique and flexible two-dimensional data-distribution in order to reduce the overhead of memory traffic and data transfer, respectively. Eigen_s performs excellently on the T2K system with 4,096 cores (theoretical peak: 37.6 TFLOPS) and exhibits fine performance (3.0 TFLOPS) with a 200,000-dimensional matrix. The enhanced version, eigen_sx, uses more advanced algorithms, such as the narrow-band reduction algorithm, DC for band matrices, and the block Householder back-transformation with WY- representation. Even though this version is still in the test stage, eigen_sxhas realized 4.7 TFLOPS with a 200,000-dimensional matrix.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call