Abstract

As a widely used speech-triggered interface, deep-learning based keyword spotting (KWS) chips require both ultra-low power and high detection accuracy. We propose an always-on keyword spotting chip with an acoustic activity detection (AAD) to achieve the above two requirements. Extracted from feature extractor, this AAD has zero overhead and zero miss rate. It is used to clock gate the neural network and post processing unit to achieve ultra-low power at silent scenarios. We also propose a tunable detection window to fit keywords with different widths to get better accuracy. Besides, a non-overlapping-frame Mel frequency cepstrum coefficient (MFCC) is used in the KWS system to reduce memory footprint and processing cycles. Implemented in a 28nm CMOS technology, its power consumption is only <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$0.36\mu\mathrm{W}$</tex> for AAD at quiet scenarios and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$0.8\mu\mathrm{W}$</tex> for KWS, operating at 0.4V supply voltage with 8kHz for MFCC and 200kHz for other parts. And MFCC circuit has only 170nW power consumption. The accuracy can reach 97.8% for two keywords in the Google speech command data set (GSCD).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call