MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Qin Li,Tianxiang Lan,Huazhong Yang,Xinjun Liu,Yuze Yang,Huifeng Zhu,Qi Wei,Fei Qiao

doi:10.1109/access.2020.2979799

Qin Li, Tianxiang Lan + Show 6 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.2979799

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 41	License type: CC BY 4.0

Affiliation: Tsinghua University, Beijing Institute of Technology

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Feature extraction is an essential part of automatic speech recognition (ASR) to compress raw speech data and enhance features, where conventional implementation methods based on the digital domain have encountered energy consumption and processing speed bottlenecks. Thus, we propose a Mixed-Signal Processing (MSP) architecture to efficiently extract Mel-Frequency Cepstrum Coefficients (MFCC) features. We design MSP-MFCC to pre-process speech signals in the analog domain, which significantly reduces the cost of the analog-to-digital converter (ADC), as well as the computational complexity of the digital back-end. Moreover, MSP-MFCC eliminates the time-consuming Fourier transform in the conventional digital realization by improving processing flow. We fabricated the analog part based on 180nm CMOS mixed-signal technology, then measured the chip. The measured results show the energy consumption of MSP-MFCC is $0.72~\mu \text{J}$ /frame, and the processing speed is up to $45.79~\mu \text{s}$ /frame. MSP-MFCC achieves 95% energy saving and about $6.4\times $ speedup than state of the art. Further, by using the features extracted by MSP-MFCC, speech recognition simulation reaches the accuracy of 98.2%, which also keeps the leading performance to its current counterparts. The proposed MFCC extractor is competitive for integration in the ultra-low-power always-on wearable speech recognition applications.

Highlights

Speech interaction has become an essential way of humanmachine interaction [1], [2], in which, automatic speech recognition (ASR) plays a vital role in perceiving speech signals
CONVENTIONAL Mel-Frequency Cepstrum Coefficients (MFCC) EXTRACTING METHOD The commonly used MFCC extraction process is shown in Fig. 2 [19], including a microphone in the front-end, analog-to-digital converter, and feature extraction in the backend
The measured power consumption of all filters in the different groups speedup than state of the art [4]. This is the best performance ever reported for entire MFCC feature extraction

Summary

INTRODUCTION

Speech interaction has become an essential way of humanmachine interaction [1], [2], in which, automatic speech recognition (ASR) plays a vital role in perceiving speech signals. Some other works [11], [12] about efficient MFCC extraction are proposed based on FPGA for low-cost speech recognition systems. To achieve more energy-efficient and faster feature extraction for wearable automatic speech recognition, a novel mixed-signal processing architecture to extract MFCC features (MSP-MFCC) is proposed here. MSP-MFCC is investigated, improved and implemented from the disciplines of architecture, algorithm, and silicon proven: 1) Architecture Techniques: Proposed mixed-signal processing architecture achieves higher efficiency and faster speed than state of the art. CONVENTIONAL MFCC EXTRACTING METHOD The commonly used MFCC extraction process is shown in Fig. 2 [19], including a microphone in the front-end, analog-to-digital converter, and feature extraction in the backend. The post-processing operations including logarithmic multiplying and Discrete Cosine Transformation (DCT) are performed next to transform the filtered signals to MFCC features.

HARDWARE IMPLEMENTATION ANALYSIS

ANALOG SQUARE OPERATION

THE MEASURED PERFORMANCE AND COMPARISON

Findings

CONCLUSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Time-Delay-Neural-Network-Based Audio Feature Extractor for Ultra-Low Power Keyword Spotting
Hiroshi Fuketa
IEEE Transactions on Circuits and Systems II: Express Briefs | VOL. 69
Hiroshi FuketaHiroshi Fuketa
01 Feb 2022
IEEE Transactions on Circuits and Systems II: Express Briefs | VOL. 69

Speaker Independent Speech Recognition using MFCC with Cubic-Log Compression and VQ Analysis
Ashutosh Datar ... Neeraj Kaberpanthi
International Journal of Computer Applications | VOL. 95
Ashutosh Datar, et. al.Ashutosh Datar ... Neeraj Kaberpanthi
18 Jun 2014
International Journal of Computer Applications | VOL. 95

Perturbation analysis of mel-frequency cepstrum coefficients
Wei-Qiang Zhang ... Dengzhou Yang
-
Wei-Qiang Zhang, et. al.Wei-Qiang Zhang ... Dengzhou Yang
01 Nov 2010
01 Nov 2010

Energy-efficient MFCC extraction architecture in mixed-signal domain for automatic speech recognition
Qin Li ... Huifeng Zhu
-
Qin Li, et. al.Qin Li ... Huifeng Zhu
17 Jul 2018
17 Jul 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access