Voice recognition application has been widely used in peoples daily lives. Usually, if people find it inconvenient to directly play the audio in public, or it is too noisy for them to listen to an audio message, they would probably use the voice recognition application in their mobile devices to translate the voice into text, so that they can understand the message clearly. However, there are still various errors and problems occurring during the usage of this common application, for instance, it can be hard sometimes to translate the sound into words correctly. This usually happens when the speaker speaks too fast and pronounces unclearly. Besides, some other factors, such as environmental noise, transmission channel quality, and the radio equipment, would also cause this problem. This paper mainly analyzes the causes of the inaccurate translation problem and some potential improvements to make this function more perfect. In conclusion, to solve this problem, the parity check matrix is a good way, since it can check whether the digits behind each word are still correct. After doing so, the digits will be changed into the correct words syntactically. However, even though the words are syntactically correct after applying the parity check matrix, they might not be semantically correct. Therefore, Levenshtin distance and Latent Semantic Analysis can be used to analyze the hidden meanings of words and sentences, so as to find the best suitable words to change.
Read full abstract