Abstract
The purpose of this work is to develop a spoken language processing system for smart device troubleshooting using human-machine interaction. This system combines a software Bidirectional Long Short Term Memory Cell (BLSTM)-based speech recognizer and a hardware LSTM-based language processor for Natural Language Processing (NLP) using the serial RS232 interface. Mel Frequency Cepstral Coefficient (MFCC)-based feature vectors from the speech signal are directly input into a BLSTM network. A dropout layer is added to the BLSTM layer to reduce over-fitting and improve robustness. The speech recognition component is a combination of an acoustic modeler, pronunciation dictionary, and a BLSTM network for generating query text, and executes in real time with an 81.5% Word Error Rate (WER) and average training time of 45 s. The language processor comprises a vectorizer, lookup dictionary, key encoder, Long Short Term Memory Cell (LSTM)-based training and prediction network, and dialogue manager, and transforms query intent to generate response text with a processing time of 0.59 s, 5% hardware utilization, and an F1 score of 95.2%. The proposed system has a 4.17% decrease in accuracy compared with existing systems. The existing systems use parallel processing and high-speed cache memories to perform additional training, which improves the accuracy. However, the performance of the language processor has a 36.7% decrease in processing time and 50% decrease in hardware utilization, making it suitable for troubleshooting smart devices.
Highlights
Manipulating speech signals to extract relevant information is known as speech processing [1].This work integrates an optimized realization of speech recognition with Natural Language Processing (NLP) and a Text to Speech (TTS) system to perform Spoken Language Processing (SLP) using a hybrid software-hardware design approach
These results indicate the improvement in accuracy by implementing the Bidirectional Long Short Term Memory Cell (BLSTM)-based speech recognition system
The results reveal that the performance of the language processor is better in terms of the F1 score and processing time
Summary
This work integrates an optimized realization of speech recognition with Natural Language Processing (NLP) and a Text to Speech (TTS) system to perform Spoken Language Processing (SLP) using a hybrid software-hardware design approach. SLP involves three major tasks, namely translating speech to text (speech recognition), capturing the intent of the text, action determination using data processing techniques (NLP), and responding to users through voice (Speech Synthesis). Memory cell (LSTM), a class of Recurrent Neural Networks (RNN), is currently the state-of-the-art for continuous word speech recognition and NLP, due to its ability to process sequential data [2]. There are several LSTM-based speech recognition techniques available in the literature. For end-to-end speech recognition, speech spectrograms are chosen directly as the pre-processing scheme and processed by a deep bidirectional LSTM network with a novel Connectionist Temporal
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.