Apply an optimized NN model to low-dimensional format speech recognition and exploring the performance with restricted factors

Joy Iong-Zong Chen,Lu-Tsou Yeh

doi:10.1177/00202940221109778

Joy Iong-Zong Chen, Lu-Tsou Yeh

Open Access

https://doi.org/10.1177/00202940221109778

Copy DOI

Abstract

The SCD (speech control detection) have received a lot of attention in recent years. A framework established by employing DNN-LSTM (deep neural network-long-short term memory) model for speech and text recognition is implemented in the current article. The performance of the build framework is analyzed with many different merits which consider many features, such as (with and without) noise, track number of speeches (ST (single track) and DT (double track)), and dropout ratio of data training. On the other hand, the speech discriminator model is developed and implemented with the DNN-LSTM framework, and the data sets are collected by four different persons. The adopted model performance is evaluated using the four different datasets, and each with 400–5000 training times. There are three parameters considered as the dominators for the performance evaluation of the completed speech platform. The results from the experiment with DT channel case clearly show that it outperforms the case with ST channel. It can see that the accuracy of the DNN-LSTM model increases from 0.3339 to 0.9696 and the loss rate decreases from 1.09984 to 0.19298 after adjusting the dropout ratio during the training step. This shows that the dropout ratio also dominates the accuracy and loss rate. Eventually, the results indicate that the used model compared to other similar methods, Bi-LSTM (bi-directional LSTM), achieves a more efficient preserving a high accuracy level.

Full Text