Abstract

En Japón se han llevado a cabo muchas actividades de investigación acerca de la traducción automática del habla. Este artículo pretende ofrecer una visión general de dichas actividades y presentar las que se han realizado más recientemente. El sistema S2ST está formado básicamente por tres módulos: el reconocimiento automático del habla continua y de amplios vocabularios (Automatic Speech Recognition, ASR), la traducción automática de textos (Machine translation, MT) y la conversión de texto a voz (Text-to-Speech Synthesis, TTS). Todos los módulos deben ser plurilingües, por lo cual se requieren discursos y corpus multilingües para los modelos de formación. El rendimiento del sistema S2ST mejora considerablemente por medio de un aprendizaje profundo y grandes corpus formativos. Sin embargo, todavía hace falta tratar diversos aspectos, com la simultaneidad, la paralingüística, la dependencia del contexto y de la situación, la intención y la dependencia cultural. Por todo ello, repasaremos las actividades de investigación actuales y discutiremos varias cuestiones relacionadas con la traducción automática del habla de última generación.

Highlights

  • The major increase in demand for cross-lingual conversations, triggered by IT technologies such as the Internet and an expanding borderless community, has fuelled research into machine speech-to-speech translation (S2ST) technology

  • The S2ST system is basically composed of three modules: large vocabulary continuous speech recognition (ASR), machine text-to-text translation (MT) and text-to-speech synthesis (TTS)

  • Neural network architectures have been shown to provide a powerful model for machine translation and speech recognition, and several recent studies have attempted to extend the models for end-to-end speech translation tasks

Read more

Summary

Introduction

The major increase in demand for cross-lingual conversations, triggered by IT technologies such as the Internet and an expanding borderless community, has fuelled research into machine speech-to-speech translation (S2ST) technology. The S2ST system is basically composed of three modules: large vocabulary continuous speech recognition (ASR), machine text-to-text translation (MT) and text-to-speech synthesis (TTS). All these modules need to be multilingual for users around the world and necessitate multilingual speech and corpora for training models. S2ST needs to work in real-time with very low latency and efficiency since it will be used for real-time communication online. From another perspective, S2ST difficulties depend on the degree of similarity between source and target languages. Translating from Japanese to English requires, (1) word separation for Japanese because Japanese has no explicit spacing information, (2) translating Japanese into English involves a completely different style due to word order and their coverage

S2ST Research in Japan
Paralinguistic speech translation
Direct speech translation
Simultaneous speech translation
Automatic simultaneous S2ST
Corpus development
Concluding remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call