Density functional calculations with a plane-wave basis set are widely used in materials science. Due to recent developments in high-performance computers, the number of nodes equipped in such computers greatly exceeds the number of atoms included in a typical simulation. Thus, it is becoming difficult to perform calculations efficiently even when only a portion of all nodes are used (e.g., 10%). We have developed a multi-axis decomposition scheme in which both G-vectors and band axes are decomposed and 3D-FFT communicators are folded compactly. This proposed scheme retains the inner-most do-loop lengths sufficiently long and restrains the increased MPI communication costs as the number of nodes increases. In an investigation of a wide-gap semiconductor material (SiC), our PHASE/0 DFT code exhibits efficient and strong scaling (up to 82,944 nodes) even for a relatively small system with 3848 atoms, and demonstrates maximum peak performance of 2.25 PFLOPS for a 25,200-atom system despite employing 3D-FFT.