Abstract

The major problem of embedding third-party software solutions into your own software is that any improvements or upgrades for that third-party software is out of your control. This same problem could be claimed as a major issue for any third-party speech recognition engine that could be used to bring an ability to use voice control features of custom software applications. Among the number of different solutions, either free or commercial, providing a speech recognition there is no any yet giving a 100% quality. The aim of current research is to build and test an approach, that should improve a quality of speech recognition without necessity of making changes in a third-party engine. For the purpose of current research paper, we’d chosen Microsoft Speech Recognition Engine as a core engine to provide feature of speech recognition that could be integrated into a custom software. Next, using Tensorflow, a neural network was trained to provide technique of speakers’ diarization from audio stream. Finally, the Levenshtein’s algorithm was used to improve speech recognition quality via application of word correction filter developed in C#. For the purpose of test of integration of speech recognition feature into custom software the application was developed also in C#. As a result, the quality of speech recognition for the test dataset was raised up by 16-17% average.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.