Lattice-based cryptography continues to dominate in the second-round finalists of the National Institute of Standards and Technology post-quantum cryptography standardization process. Computational efficiency is primarily considered to evaluate promising candidates for final round selection. In lattice-based cryptosystems, polynomial multiplication is the most expensive computation and critical to improve the performance. This paper proposes an efficient number theoretic transform (NTT) architecture to accelerate the polynomial multiplication. The proposed design applies mixed-radix multi-path delay feedback architecture and flexibly adopts various polynomial sizes. Configurable NTT design is realized to perform forward and inverse NTT computations on a unified hardware, which is then used to develop an efficient polynomial multiplier. The proposed architectures were successfully accelerated on several Xilinx FPGA platforms to directly compare with state-of-the-art works. The implementation results show that the proposed NTT architectures have comparable area-time product and demonstrate <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.7\sim 17\times $ </tex-math></inline-formula> performance improvement, and the proposed polynomial multipliers achieve higher performance compared with previous works. Experimental results confirmed the proposed design’s applicability for high-speed large-scale data cryptoprocessors.