Abstract

In this work, voice activity detection (VAD) systems with system-level energy-quality (EQ) scaling are investigated. Compared to prior single-knob EQ scaling, multiple EQ knobs are selectively inserted into the entire signal chain from end to end. EQ knobs are dynamically co-optimized to minimize energy for a given quality target. The analysis shows that system-level EQ optimization provides several benefits and has interesting implications on the performance of machine learning-based classification, as exemplified by decision trees in this work. First, it can make quality degradation more graceful than single-knob, allowing for more aggressive energy reduction under a given quality target, while retaining the ability to operate at full quality. Also, proper system-level EQ optimization enhances fitting in machine learning-based systems (e.g., decision tree-based), suppressing both underfitting and overfitting. The analysis also shows that context-specific retraining significantly improves quality and resolves fitting issues, especially at low input SNR. Measurements on a 28nm testchip show that system-level EQ scaling can reduce energy by up to 3.5X at 2% accuracy degradation in 10-dB noise, compared to full quality. Iso-technology comparison shows that the minimum energy of 51.9 nJ/frame is lower than prior art by 1.9-74.4X at comparable speech/non-speech hit rates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.