Abstract

Speech translation is conventionally carried out by cascading an automatic speech recognition (ASR) and a statistical machine translation (SMT) system. The hypotheses chosen for translation are based on the ASR system’s acoustic and language model scores, and typically optimized for word error rate, ignoring the intended downstream use: automatic translation. In this paper, we present a coarseto-fine model that uses features from the ASR and SMT systems to optimize this coupling. We demonstrate that several standard features utilized by ASR and SMT systems can be used in such a model at the speech-translation interface, and we provide empirical results on the Fisher Spanish-English speech translation corpus.

Highlights

  • Speech translation is the process of translating speech in the source language to text or speech in the target language

  • This paper presents a featurized model which performs the job of hypothesis selection from the outputs of the Automatic Speech Recognition (ASR) system for the input to the statistical machine translation (SMT) system

  • We present a general framework in which hypothesis selection can be carried out using knowledge from the ASR and the SMT system

Read more

Summary

Introduction

Speech translation is the process of translating speech in the source language to text or speech in the target language. Step three involves training and tuning a Statistical Machine Translation (SMT) system and decoding the output extracted through the speech translation interface. There may exist hypotheses that a trained SMT system may find easier to translate and produce better translations for than the ones that are deemed best based on the ASR acoustic and language model scores. 2. Coarse-to-fine grained decoding : An intermediate model which acts as an interface and is a weak (coarse) version of the downstream process may be able to select better hypotheses. A weak translation decoder can be used as the interface to estimate the expected translation quality of an ASR hypothesis This method of hypothesis selection should be able to incorporate features from the ASR and the SMT system. Optimization for hypothesis selection at the Speech-Translation interface should be conducted using phrases as the basic unit instead of words

Coarse-to-Fine Speech Translation
A simple model : Maximum Spanning Phrases
A general featurized model for hypothesis selection
A discussion about related techniques
Training
Features
Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call