This manuscript presents an ultra-low power acoustic feature extractor for always-on voice activity detection (VAD). It extracts voice features by a 10-band passive switched-capacitor (SC) bandpass filter (BPF) bank and digitizes the features using a passive SC envelope-to-digital converter at the low feature rate. The SC feature extractor minimizes the impact of process-voltage-temperature (PVT) variation at the circuit level, and is thereby free from costly chip-wise training or calibration while at the same time being capable of achieving a classification accuracy matching the state of the arts. Experimental results from a VAD feature extractor prototype fabricated in a 0.18-μm CMOS validate the effectiveness of the proposed techniques. It achieves an averaged 90%/86% speech/non-speech hit rates at 10 dB signal-to-noise ratio for all tested chips based on a universal classifier trained with data from one chip. The feature extractor is mostly passive and thus a low power consumption of 270 nW is achieved. In addition, the proposed feature extractor is frequency-scalable, which allows power-efficient multi-purpose acoustic system implementation.