Development and integration of speech recognition tools into software applications and an approach to improve of speech recognition quality

Artur Dovbysh,Vladyslav Alieksieiev

doi:10.1109/tcset49122.2020.235505

Abstract

The major problem of embedding third-party software solutions into your own software is that any improvements or upgrades for that third-party software is out of your control. This same problem could be claimed as a major issue for any third-party speech recognition engine that could be used to bring an ability to use voice control features of custom software applications. Among the number of different solutions, either free or commercial, providing a speech recognition there is no any yet giving a 100% quality. The aim of current research is to build and test an approach, that should improve a quality of speech recognition without necessity of making changes in a third-party engine. For the purpose of current research paper, we’d chosen Microsoft Speech Recognition Engine as a core engine to provide feature of speech recognition that could be integrated into a custom software. Next, using Tensorflow, a neural network was trained to provide technique of speakers’ diarization from audio stream. Finally, the Levenshtein’s algorithm was used to improve speech recognition quality via application of word correction filter developed in C#. For the purpose of test of integration of speech recognition feature into custom software the application was developed also in C#. As a result, the quality of speech recognition for the test dataset was raised up by 16-17% average.

Full Text