Abstract

As a widely used speech-triggered interface, deep-learning-based keyword spotting (KWS) chips require both ultra-low power and high detection accuracy. We propose a sub-microwatt KWS chip with an acoustic activity detection (AAD) to achieve the above two requirements, including the following techniques: first, an optimized feature extractor circuit using nonoverlapping-framed serial Mel frequency cepstral coefficient (MFCC) to save half of the computations and data storage; second, a zero-cost AAD by using MFCC’s 1st-order output to clock gate neural network (NN) and postprocessing (PP) unit, with 0 miss rate; third, a tunable detection window to adapt to different keyword lengths for better accuracy; and finally, a true form computation method to decrease data transitions and optimized PP. Implemented in a 28-nm CMOS process, this AAD-KWS chip has a 0.4-V supply, an 8-kHz frequency for MFCC, and a 200-kHz frequency for other parts. It consumes <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$0.36~\mu \text{W}$ </tex-math></inline-formula> in quiet scenarios when AAD is enabled and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$0.8~\mu \text{W}$ </tex-math></inline-formula> in normal scenarios, where the MFCC circuit consumes only 170 nW. Its accuracy reaches 97.8% for two keywords in the Google Speech Command Dataset (GSCD).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call