Abstract

The article discusses the methods and algorithms of speech-to-text conversion, modern open and commercial systems for creating systems, as well as the use of these technologies in the field of cyber security. It is proposed to create a high-quality speech-to-text conversion system. An analysis of the mathematical algorithms used to reduce the error rate, which makes it possible to create unique voice prints and increase protection against forgery, has been carried out. The structure of modern speech-to-text conversion systems is described. By changing datasets, parameters of hidden Markov models, a high-quality dictionary of phonemes, and the use of language models, there is an opportunity to reduce the percentage of errors in language recognition, as well as the use of a system for multilingualism such as "surzhyk". The mathematical methods of assessing the quality of the system of speech to text (WER), as well as various methods of calculation, which is important for their further improvement and optimization, are considered. The structure of modern systems is considered, namely, signal pre-processing, feature extraction, acoustic modeling, speech modeling, decoding, post-processing. For each of the stages, study vectors have been proposed that can reduce the error rate of the system as a whole. Reducing speech recognition errors and the ability to fake a voice is achieved using various methods: deep neural networks, hidden Markov models, Baum-Welch algorithm, N-gram models, models with attention, creation of a high-quality phonemes dictionary, dataset, and fillers. Speech-to-text conversion technology can be used in biometric authentication systems to detect and analyze the unique features of the user's voice. However, modern speech-to-text conversion systems for Ukrainian, Russian, and "surzhyk" need improvement in acoustic and language units. Scientific works, which are devoted to research and optimization of these systems for biometric authentication, do not fully cover these issues. This became the reason for further research in this direction, so this work aims to create a speech recognition system with a minimum error rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.