Abstract

This paper presents an ultra-low-power voice activity detection (VAD) system to discriminate speech from non-speech parts of audio signals. The proposed VAD system uses level-crossing sampling for voice activity detection. The useless samples in the non-speech parts of the signal are eliminated due to the activity-dependent nature of this sampling scheme. A 40 ms moving window with a 30 ms overlap is exploited as a feature extraction block, within which the output samples of the level-crossing analog-to-digital converter (LC-ADC) are counted as the feature. The only variable used to distinguish speech and non-speech segments in the audio input signal is the number of LC-ADC output samples within a time window. The proposed system achieves an average of 91.02% speech hit rate and 82.64% non-speech hit rate over 12 noise types at −5, 0, 5, and 10 dB signal-to-noise ratios (SNR) over the TIMIT database. The proposed system including LC-ADC, feature extraction, and classification circuits was designed in 0.18 µm CMOS technology. Post-layout simulation results show a power consumption of 394.6 nW with a silicon area of 0.044 mm2, which makes it suitable as an always-on device in an automatic speech recognition system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.