‘Fast Fourier transform’ (FFT), being a prevalent algorithm for the proficient computation of ‘discrete Fourier transform,’ constitutes one of the major sub-modules in numerous real-time signal processing systems. In this article, a new approach of CORDIC-based high-radix FFT architecture has been demonstrated. Having identified the complex rotation as the most time-consuming elementary operation of FFT, the number of such complex rotations has been optimized by adopting radix-8-based FFT computation. To add to this, CORDIC is employed to realize the complex rotation, keeping aside its multiplier–accumulator (MAC)-based counterpart, for further economizing the VLSI implementation of the proposed FFT architecture. Furthermore, the requirement of CORDIC blocks for last three stages of radix-8 FFT computation has totally been mitigated by utilizing SCALE blocks as the rotation in those stages can be expressed in terms of $$\pi /4$$ or its multiples. RAM is arranged in the form of memory banks to provide parallel data path operations, and RAM switching is performed in between stages for sustaining continuous data flow circumventing data access hazards. The throughput of the proposed radix-8 architecture is eight outputs per clock cycle, while the maximum clock frequency is limited only by the propagation delay of an adder. Hardware utilization and comparative performance evaluation have been reported to prove the proposed architecture’s supremacy. Our proposed prototype radix-8 architecture has been successfully implemented on Zynq UltraScale+ FPGA using Xilinx Vivado 18.2 software for verifying its feasibility in practical applications.