Abstract

We demonstrate the potential for using aligned bilingual word embeddings in developing an unsupervised method to evaluate machine translations without a need for parallel translation corpus or reference corpus. We explain different aspects of digital entertainment content subtitles. We share our experimental results for four languages pairs English to French, German, Portuguese, Spanish, and present findings on the shortcomings of Neural Machine Translation for subtitles. We propose several improvements over the system designed by Gupta et al. [P. Gupta, S. Shekhawat and K. Kumar, Unsupervised quality estimation without reference corpus for subtitle machine translation using word embeddings, IEEE 13th Int. Conf. Semantic Computing, 2019, pp. 32–38.] by incorporating custom embedding model curated to subtitles, compound word splits and punctuation inclusion. We show a massive run time improvement of the order of [Formula: see text] by considering three types of edits, removing Proximity Intensity Index (PII) and changing post-edit score calculation from their system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.