Fast Fourier transform (FFT) is an essential algorithm in digital signal processing and advanced mobile communications. With the continuous development of modern technology, the area-power efficient hardware implementation of FFT has attracted a lot of attention. In this article, a novel design for FFT implementation is proposed. The number of resource-expensive multiplications in our design is decreased by a twiddle factor merging technique that reduces the hardware area. Subsequently, a common subexpression sharing scheme is applied to reuse the hardware resources to further save the hardware area. In addition, a magnitude-response aware approximation algorithm is proposed for applications where the transformation accuracy can be compromised a little bit for lesser hardware area and power dissipation. Logic synthesis shows that the proposed 16-point FFT architecture can save hardware area and power dissipation on application-specific integrated circuit (ASIC) by up to 65.7% and 53.1% compared with recently published designs. Similarly, the proposed 32-point FFT architecture achieves up to 58.8% reduction on hardware area and 60.0% reduction on power dissipation on ASIC.