Abstract

Currently, speech technology allows for simultaneous subtitling of live television programs using speech recognition and the respeaking approach. Although many previous studies on the quality of live subtitling utilizing voice recognition have been proposed, little attention has been paid to the quantitative elements of subtitles. Due to the high performance of neural machine translation (NMT), it has become the standard machine translation method. A data-driven translation approach requires high-quality, large-scale training data and powerful computing resources to achieve good performance. However, data-driven translation will face challenges when translating languages with limited resources. This paper’s research work focuses on how to integrate linguistic knowledge into the NMT model to improve the translation performance and quality of the NMT system. A method of integrating semantic concept information in the NMT system is proposed to address the problem of out-of-set words and low-frequency terms in the NMT system. This research also provides an NMT-centered read modeling and decoding approach integrating an external knowledge base. The experimental results show that the proposed strategy can effectively increase the MT system’s translation performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call