We report a novel speech recognition method using a noise-robust acoustic sensor system that integrates a spatially frequency-separating sensor with a nonlinear amplification algorithm, mimicking the cochlea’s basilar membrane and hair cells. The multichannel piezoelectric artificial basilar membrane (ABM) sensor detects specific sound frequencies with high sensitivity over 0.2–6 kHz. The signal processing model of the Artificial Hair Cell inspired by the signal transduction mechanism of inner hair cells, simultaneously enhances the frequency selectivity of ABM sensors and improves noise robustness. In a 0 dB SNR noisy environment, it effectively detected the voice signal with a maximum SNR of 57 dB. Furthermore, we converted the frequency-separated signals for speech sounds in various noisy environments into heatmap images and utilized them as input for a CNN-based speech recognition algorithm. Consequently, our system demonstrated noise-robust recognition performance with 94 % accuracy, even in noisy environments.