RESEARCH OF THE PROBLEM OF SPEECH RECOGNITION FOR SOLUTION OF SPECIAL TASKS

O Pomortseva,S Kobzan

doi:10.33042/2522-1809-2022-6-173-91-95

Abstract

In the article, the authors conducted a study of the actual problem of machine translation of information from audio or video files into text form (transcription). This is necessary for people with limited physical capabilities, or diseases or for those who need to process information in the form of a text file. The process of transcription is relevant at present (in the conditions of hostilities). Today in Ukraine, transcription is necessary to solve complex special tasks. Namely, solving the task of searching and identifying certain content that is transmitted by various means of communication in conversations in the form of audio files. Such tasks are currently quite relevant and quite time-consuming and take same time. To solve this problem, the authors conducted a study and identified the strengths and weaknesses of the programs that are often used for these purposes. The types of transcription and the software currently used are presented in separate tables with all their features. Existing automatic language transcription algorithms still make significant errors, but their main advantage is time (or synchronicity). When it comes to solving special tasks, time is the most decisive factor. Terabytes of clearly annotated data are needed to increase the accuracy of the text received by the transcriber program. Programs with artificial intelligence, in addition to extracting essences to understand the meaning of language, allow us to recognize and understand the form: combinations of sounds, letters, and syllables that are built into words and sentences. Only in this way will the machine be able to decode human speech correctly and correctly. An extremely important task is to determine the location of the speaker - geolocation, even with the determination of the specific location of the real estate object. This can be used for data collection and subsequent analysis of public sentiment and rapid response with subsequent localization of illegal activities. In the article, the authors concluded that for decoding audio files and automatically converting them into text format, a promising direction is the use of not just ready-made services, but the use of services with a built-in artificial intelligence function, so-called self-learning systems. Keywords: scription, time code, language decoding, geolocation, database, geographic information system.

Full Text