ABSTRACT The present study performs direct numerical simulations of turbulent channel flows using a spectral method in a large computational domain. Because of applying Fourier discretisation in the spectral method, parallelisation of the method may incur heavy communication overhead, thereby resulting in poor scalability. We design and improve the spectral code by exploring parallel techniques, including domain decomposition and data transposition algorithms. We focus particularly on the 2D domain decomposition and data transpose algorithm with the non-blocking collective operations improves parallel performance, thereby enabling latency mitigation by overlapping the computation and communication. Finally, we evaluate the code on the Nurion supercomputer at KISTI supercomputing centre. The transpose algorithm based on the non-blocking collective operations shows the best performance, which enables 3.55 times faster computing on 256 nodes using 16,384 MPI ranks for the L550 case of grid points than the non-optimised 2D decomposition case.