Development of domain-specific automatic speech recognition models based on open-source data

Vladimir Nechaev,Sergey Kosyakov

doi:10.1016/j.procs.2023.12.028

Abstract

Currently, deep neural network architectures are used in the development of automatic speech recognition models for specialized subject areas, which require a large volume of training data. However, these models often prove to be poorly suitable for use in specific information systems due to the inadequate recognition of specialized subject vocabularies. Further training of models to improve their quality in a specific context of recognition encounters difficulties in obtaining sufficient data and the laboriousness of their labeling. Given this, a pertinent task is the creation of methods that allow for a reduction in the effort required to build applied speech recognition models and improve their quality when used in various subject areas. The study used language model-based text topic modeling techniques to adapt open-source data. A deep neural network was used as the pre-trained speech recognition model. For training, data sets from open sources were utilized. A method has been developed for creating automatic speech recognition models for specialized subject areas. This method involves an intermediate step of learning the lexicon of the subject area from data obtained from open sources, selected using thematic sampling. Based on this method, automatic speech recognition models for the energy and healthcare sectors were created and researched, demonstrating higher recognition results than models developed through traditional methods. The validation of the proposed method confirmed its effectiveness. The applied neural network models, developed based on this method, demonstrated their ability to operate in information systems of energy and healthcare facilities in Russian, English, and Italian languages without additional training on closed data.

Full Text