In this paper, we introduce a configurable hardware architecture that can be used to generate unified and parametric NTT-based polynomial multipliers that support a wide range of parameters of lattice-based cryptographic schemes proposed for post-quantum cryptography. Both NTT and inverse NTT operations can be performed using the unified butterfly unit of our architecture, which constitutes the core building block in NTT operations. The multitude of this unit plays an essential role in achieving the performance goals of a specific application area or platform. To this end, the architecture takes the size of butterfly units as input and generates an efficient NTT-based polynomial multiplier hardware to achieve the desired throughput and area requirements. More specifically, the proposed hardware architecture provides run-time configurability for the scheme parameters and compile-time configurability for throughput and area requirements. This work presents the first architecture with both run-time and compile-time configurability for NTT-based polynomial multiplication operations to the best of our knowledge. The implementation results indicate that the advanced configurability has a negligible impact on the time and area of the proposed architecture and that its performance is on par with the state-of-the-art implementations in the literature, if not better. The proposed architecture comprises various sub-blocks such as modular multiplier and butterfly units, each of which can be of interest on its own for accelerating lattice-based cryptography. Thus, we provide the design rationale of each sub-block and compare it with those in the literature, including our earlier works in terms of configurability and performance.