Automatic recognition and representation of text in the form of audio stream

L V Serebryanaya,I E Lasy

doi:10.35596/1729-7648-2021-19-6-51-58

L V Serebryanaya, I E Lasy

Open Access

https://doi.org/10.35596/1729-7648-2021-19-6-51-58

Copy DOI

Abstract

The problem of automatic speech generation from a text file is considered. An analytical review of the software has been completed. They are designed to recognize texts and convert them to an audio stream. The advantages and disadvantages of software products are estimated. Based on this, a conclusion was drawn about the relevance of developing a software for automatic generation of an audio stream from a text in Russian. Models based on artificial neural networks, which are used for speech synthesis, are analyzed. After that, a mathematical model of the created software is built. It consists of three components: a convolutional encoder, a convolutional decoder, and a transformer. The architecture of the software is designed. It includes a graphical interface, an application server, and a speech synthesis system. A number of algorithms have been developed: preprocessing text before loading it into a software, converting audio files of a training sample and training a network, generating speech based on arbitrary text files. A software has been created, which is a single-page application and has a web interface for interacting with the user. To assess the quality of the software, a metric was used that represents the average score of different opinions. As a result of the aggregation of different opinions, the metric received a sufficiently high value, on the basis of which it can be assumed that all the tasks have been solved.

Highlights

В результате механизм внимания значительно повышает производительность работы сети как во время ее обучения, так и в ходе основной работы
Модель сети состоит из трех компонентов: – сверточного кодировщика, конвертирующего признаки текста во внутреннее векторное представление; – сверточного декодировщика с механизмом внимания, преобразующего внутреннее представление в мел-спектограмму; – преобразователя, предсказывающего параметры вокодера, основываясь на состояниях декодировшика
Information about the authorsSerebryanaya L.V., PhD, Associate Professor, Associate Professor at the Information Technologies Software Department of the Belarusian State University of Informatics and Radioelectronics. Lasy I.E., Graduate of the Information Technologies Software Department of the Belarusian State University of Informatics and Radioelectronics

Summary

АВТОМАТИЧЕСКОЕ РАСПОЗНАВАНИЕ И ПРЕДСТАВЛЕНИЕ ТЕКСТА В ВИДЕ АУДИОПОТОКА

Белорусский государственный университет информатики и радиоэлектроники (г. Минск, Республика Беларусь). Рассмотрена задача автоматической генерации речи из текстового файла. Предназначенных для распознавания текстов и преобразования их в аудиопоток. Оценены их преимущества и недостатки, на основании чего сделан вывод об актуальности разработки программного средства автоматической генерации аудиопотока из текста на русском языке. Проанализированы модели на основе искусственных нейронных сетей, которые используются для синтеза речи, после чего построена математическая модель создаваемого программного средства. Спроектирована архитектура программного средства, в которую входят графический интерфейс, сервер приложения и система синтеза речи. Разработан ряд алгоритмов: предварительной обработки текста перед загрузкой в программное средство, преобразования аудиофайлов обучающей выборки и обучения сети, генерации речи на основе произвольных текстовых файлов. Для оценки качества работы программного средства использована метрика, представляющая среднюю оценку разных мнений. LASY Belarusian State University of Informatics and Radioelectronics (Minsk, Republic of Belarus)

Математическая модель программного средства

Проектирование приложения

Реализация программного средства

Результаты и их обсуждение

Список литературы

Вклад авторов

Сведения об авторах

Information about the authors

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic recognition and representation of text in the form of audio stream

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Doklady BGUIR

Lead the way for us

Journal: Doklady BGUIR	Publication Date: Oct 1, 2021
License type: cc-by

Similar Papers

Phonetic alignment: speech synthesis-based vs. Viterbi-based
F Malfrère ... C Ris
Speech Communication | VOL. 40
F Malfrère, et. al.F Malfrère ... C Ris
13 Sep 2002
Speech Communication | VOL. 40

An integrated knowledge base for speech synthesis and automatic speech recognition
M.A.A Tatham
Journal of Phonetics | VOL. 13
M.A.A TathamM.A.A Tatham
01 Apr 1985
Journal of Phonetics | VOL. 13

Design issues in developing speech corpus for Indian languages — A survey
S Kiruthiga ... K Krishnamoorthy
-
S Kiruthiga, et. al.S Kiruthiga ... K Krishnamoorthy
01 Jan 2012
01 Jan 2012

A Review on Speech Synthesis Based on Machine Learning
Ruchika Kumari ... Ashwni Kumar
-
Ruchika Kumari, et. al.Ruchika Kumari ... Ashwni Kumar
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic recognition and representation of text in the form of audio stream

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Doklady BGUIR