Abstract

To address the power and area bottleneck imposed by the frontend feature extractor relative to the backend neural network in on-device keyword spotting (KWS), we propose two time-mode analog signal processing (ASP) circuit techniques showcased in an analog audio feature extractor chip that advances the state of the art in power- and area-efficiency. Time-mode analog filterbank interpolation uses digital XOR gates to double the number of outputs of an analog bandpass filterbank. Time-mode analog rectification uses a single digital XOR gate as an analog full-wave rectifier. The 65 nm low power (LP) CMOS chip uses only 80 nW and 0.53 mm2 to extract from an input analog audio signal, an output digital auditory feature vector with 31 elements. This represents <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$18\times $ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3.3\times $ </tex-math></inline-formula> improvements in power/feature and area/feature, as compared, respectively, to the most area- and power-efficient published analog audio feature extractor chips. All the while, competitive classification accuracy is maintained at >90% across ten keywords, as evaluated by feeding the chip’s digital output directly into a small-footprint software backend classifier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call