Physically realizable semiconductor electronic devices easily include more than several million atoms even though they are in nanoscale, so electronic structure simulations of huge atomic systems are the key to predict performance characteristics of such devices. The empirical tight-binding (TB) model, which uses a finite set of localized bases for descriptions of a single atom with assumption of nearest-neighbor coupling, has been extensively adopted to solve electronic structures of large-scale atomic systems, and the size-capability of TB simulations has been continuously improved with aids of parallel computing. Recently, we have demonstrated a strong scalability of TB simulations up to 2,500 computing nodes for a model problem consisting of 400 million atoms, where the scalability is limited due to the overhead of collective communications that the core numerical method must involve to get a partial eigenspectrum of the sparse Hamiltonian system matrix. To reduce the cost of Message Passing Interface (MPI) communications and its effect on the end-to-end simulation time, here we design an equivalent transformation of the sparse matrix eigensolver, and employ a core dedicated to MPI communications with dynamic scheduling of threads. Through rigorous benchmark tests against model problems of various sizes, it is confirmed that our method improves the strong scalability of TB simulations by more than 10% in a huge computing environment, by reducing the time-cost of both core computations and collective communications.
Read full abstract