Abstract

Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario where the model starts translating before reading the complete source input. Evaluating simultaneous translation models is more complex than offline models because the latency is another factor to consider in addition to translation quality. The research community, despite its growing focus on novel modeling approaches to simultaneous translation, currently lacks a universal evaluation procedure. Therefore, we present SimulEval, an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation. A server-client scheme is introduced to create a simultaneous translation scenario, where the server sends source input and receives predictions for evaluation and the client executes customized policies. Given a policy, it automatically performs simultaneous decoding and collectively reports several popular latency metrics. We also adapt latency metrics from text simultaneous translation to the speech task. Additionally, SimulEval is equipped with a visualization interface to provide better understanding of the simultaneous decoding process of a system. SimulEval has already been extensively used for the IWSLT 2020 shared task on simultaneous speech translation. Code will be released upon publication.

Highlights

  • While the translation quality is usually measured by BLEU (Papineni et al, 2002; Post, 2018), a wide variety of latency measurements have been introduced, such as Average Proportion (AP) (Cho and Esipova, 2016), Continues Wait Length (CW) (Gu et al, 2017), Average Lagging (AL) (Ma et al, 2019), Differentiable Average Lagging (DAL) (Cherry and Foster, 2019), and so on

  • The latency evaluation processes across different works are not consistent: 1) the latency metric definitions are not precise enough with respect to text segmentation; 2) the definitions are not precise enough with respect to the speech segmentation, for example some models are evaluated on speech segments (Ren et al, 2020) while others are evaluated on time duration (Ansari et al, 2020); 3) little prior work has released implementations of the decoding process and latency measurement

  • While all latency metrics have been defined for text translation, we discuss issues and solutions when adapting them to the task of simultaneous speech translation

Read more

Summary

Introduction

Simultaneous translation, the task of generating translations before reading the entire text or speech source input, has become an increasingly popular topic for both text and speech translation The server provides source input (text or audio) upon request from the client, receives predictions from the client and returns different evaluation metrics when the translation process is complete. SIMULEVAL has built-in support for quality metrics such as BLEU (Papineni et al, 2002; Post, 2018), TER (Snover et al, 2006) and METEOR (Banerjee and Lavie, 2005), and latency metrics such as AP, AL and DAL. Usage instructions and a case study are provided before concluding

Task Formalization
Existing Text Latency Metrics
Adapting Metrics to the Speech Task
Server
User-Defined Agent
Client
Evaluation
Visualization
User-Defined Client
Case Study
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call