Acoustic Modeling Techniques Research Articles

Introduction: An Automatic Speech Recognition (ASR) system enables to recognize the speech utterances and thus can be used to convert speech into text for various purposes. These systems are deployed in different environments such as clean or noisy and are used by all ages or types of people. These also present some of the major difficulties faced in the development of an ASR system. Thus, an ASR system need to be efficient, while also being accurate and robust. Our main goal is to minimize the error rate during training as well as testing phases, while implementing an ASR system. Performance of ASR depends upon different combinations of feature extraction techniques and back-end techniques. In this paper, using a continuous speech recognition system, the performance comparison of different combinations of feature extraction techniques and various types of back-end techniques has been presented Methods: Hidden Markov Models (HMMs), Subspace Gaussian Mixture Models (SGMMs) and Deep Neural Networks (DNNs) with DNN-HMM architecture, namely Karel’s, Dan’s and Hybrid DNN-SGMM architecture are used at the back-end of the implemented system. Mel frequency Cepstral Coefficient (MFCC), Perceptual Linear Prediction (PLP), and Gammatone Frequency Cepstral coefficients (GFCC) are used as feature extraction techniques at the front-end of the proposed system. Kaldi toolkit has been used for the implementation of the proposed work. The system is trained on the Texas Instruments-Massachusetts Institute of Technology (TIMIT) speech corpus for English language Results: The experimental results show that MFCC outperforms GFCC and PLP in noiseless conditions, while PLP tends to outperform MFCC and GFCC in noisy conditions. Furthermore, the hybrid of Dan’s DNN implementation along with SGMM performs the best for the back-end acoustic modeling. The proposed architecture with PLP feature extraction technique in the front end and hybrid of Dan’s DNN implementation along with SGMM at the back end outperforms the other combinations in a noisy environment. Conclusion: Automatic Speech recognition has numerous applications in our lives like Home automation, Personal assistant, Robotics etc. It is highly desirable to build an ASR system with good performance. The performance Automatic Speech Recognition is affected by various factors which include vocabulary size, whether system is speaker dependent or independent, whether speech is isolated, discontinuous or continuous, adverse conditions like noise. The paper presented an ensemble architecture that uses PLP for feature extraction at the front end and a hybrid of SGMM + Dan’s DNN in the backend to build a noise robust ASR system Discussion: The presented work in this paper discusses the performance comparison of continuous ASR systems developed using different combinations of front-end feature extraction (MFCC, PLP, and GFCC) and back-end acoustic modeling (mono-phone, tri-phone, SGMM, DNN and hybrid DNN-SGMM) techniques. Each type of front-end technique is tested in combination with each type of back-end technique. Finally, it compares the results of the combinations thus formed, to find out the best performing combination in noisy and clean conditions

Read full abstract

The paper focuses on the design of a practical system pipeline for always-listening, far-field spoken command recognition in everyday smart indoor environments that consist of multiple rooms equipped with sparsely distributed microphone arrays. Such environments, for example domestic and multi-room offices, present challenging acoustic scenes to state-of-the-art speech recognizers, especially under always-listening operation, due to low signal-to-noise ratios, frequent overlaps of target speech, acoustic events, and background noise, as well as inter-room interference and reverberation. In addition, recognition of target commands often needs to be accompanied by their spatial localization, at least at the room level, to account for users in different rooms, providing command disambiguation and room-localized feedback. To address the above requirements, the use of parallel recognition pipelines is proposed, one per room of interest. The approach is enabled by a room-dependent speech activity detection module that employs appropriate multichannel features to determine speech segments and their room of origin, feeding them to the corresponding room-dependent pipelines for further processing. These consist of the traditional cascade of far-field spoken command detection and recognition, the former based on the detection of “activating” key-phrases. Robustness to the challenging environments is pursued by a number of multichannel combination and acoustic modeling techniques, thoroughly investigated in the paper. In particular, channel selection, beamforming, and decision fusion of single-channel results are considered, with the latter performing best. Additional gains are observed, when the employed acoustic models are trained on appropriately simulated reverberant and noisy speech data, and are channel-adapted to the target environments. Further issues investigated concern the inter-dependencies of the various system components, demonstrating the superiority of joint optimization of the component tunable parameters over their separate or sequential optimization. The proposed approach is developed for the Greek language, exhibiting promising performance in real recordings in a four-room apartment, as well as a two-room office. For example, in the latter, a 76.6% command recognition accuracy is achieved on a speaker-independent test, employing a 180-sentence decoding grammar. This result represents a 46% relative improvement over conventional beamforming.

Read full abstract

Acoustic Modeling Techniques Research Articles

Related Topics

Articles published on Acoustic Modeling Techniques

Advancing Accessibility through Automatic Speech Recognition and NLP Integration

Evaluation of Interpretable Speech Biomarkers for Monitoring Alzheimer’s Disease and Mild Cognitive Impairment Progression

Evaluation of Interpretable Speech Biomarkers for Monitoring Alzheimer’s Disease and Mild Cognitive Impairment Progression

Evolution of acoustic methods for assessing and managing exposure of gray whales to sound pulses from seismic surveys off Sakhalin Island, Russian Far East

Amalgamation of noise elimination and TDNN acoustic modelling techniques for the advancements in continuous Kannada ASR system

Development of Small Vocabulary Continuous Speech-to-Text System for Kannada Language/Dialects

Analysis of an acoustic propagation model for sources of noise with directivity in indoor environments

Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR

"The Diagnostic Evaluation of Switchboard-corpus Automatic Speech Recognition Systems"

A hybrid CNN-LiGRU acoustic modeling using raw waveform sincnet for Hindi ASR

In silico feasibility assessment of extracorporeal delivery of low-intensity pulsed ultrasound to intervertebral discs within the lumbar spine

Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition.

Acoustic Modeling in Speech Recognition: A Systematic Review

Efficient Design Optimization of Acoustic Liners for Engine Noise Reduction

Acoustic camera modeling and compensation technique based on SONAR equation

Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing

Investigation of Various Hybrid Acoustic Modeling Units via a Multitask Learning and Deep Neural Network Technique for LVCSR of the Low-Resource Language, Amharic

Comparison of Phonemic and Graphemic Word to Sub-Word Unit Mappings for Lithuanian Phone-Level Speech Transcription

Spatial mapping of underwater noise radiated from passing vessels using automatic identification system data

Room-localized spoken command recognition in multi-room, multi-microphone environments

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Acoustic Modeling Techniques Research Articles

Related Topics

Articles published on Acoustic Modeling Techniques

Advancing Accessibility through Automatic Speech Recognition and NLP Integration

Evaluation of Interpretable Speech Biomarkers for Monitoring Alzheimer’s Disease and Mild Cognitive Impairment Progression

Evaluation of Interpretable Speech Biomarkers for Monitoring Alzheimer’s Disease and Mild Cognitive Impairment Progression

Evolution of acoustic methods for assessing and managing exposure of gray whales to sound pulses from seismic surveys off Sakhalin Island, Russian Far East

Amalgamation of noise elimination and TDNN acoustic modelling techniques for the advancements in continuous Kannada ASR system

Development of Small Vocabulary Continuous Speech-to-Text System for Kannada Language/Dialects

Analysis of an acoustic propagation model for sources of noise with directivity in indoor environments

Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR

"The Diagnostic Evaluation of Switchboard-corpus Automatic Speech Recognition Systems"

A hybrid CNN-LiGRU acoustic modeling using raw waveform sincnet for Hindi ASR

In silico feasibility assessment of extracorporeal delivery of low-intensity pulsed ultrasound to intervertebral discs within the lumbar spine

Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition.

Acoustic Modeling in Speech Recognition: A Systematic Review

Efficient Design Optimization of Acoustic Liners for Engine Noise Reduction

Acoustic camera modeling and compensation technique based on SONAR equation

Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing

Investigation of Various Hybrid Acoustic Modeling Units via a Multitask Learning and Deep Neural Network Technique for LVCSR of the Low-Resource Language, Amharic

Comparison of Phonemic and Graphemic Word to Sub-Word Unit Mappings for Lithuanian Phone-Level Speech Transcription

Spatial mapping of underwater noise radiated from passing vessels using automatic identification system data

Room-localized spoken command recognition in multi-room, multi-microphone environments