This paper presents system-architecture-circuits co-designs for computing the MFCC feature extraction for speech keywords recognition. The trade-off between accuracy and power consumption under various background noises is achieved by using the 8-stage radix-2 single-path delay feedback FFT (R2SDF-FFT) and the precision self-adaptive architecture with approximate computing. The R2SDF-FFT structure with the fine-grained bit-width quantization can reduce 35.7% of memory size. Approximate multiplication and addition with Dual-Vdd are proposed to further improve the FFT computing energy efficiency. Finally, we present the precision self-adaptive MFCC architecture with the proposed FFT, which can be dynamically configured to use two calculation modes with different hardware settings according to the input speech background noise. Implemented and evaluated under 22 nm technology, the power consumption of the proposed design can be reduced by up to 76.3%, while the accuracy increased by 0.8%.