Abstract
In this presentation, I would like to introduce the research and products of machine translation in Baidu. As the biggest Chinese search engine, Baidu has released its machine translation system in June, 2011. It now supports translations among 27 languages on multiple platforms, including PC, mobile devices, etc. Hybrid translation approach is important for building an Internet translation system. As we know, the translation demands on the Internet come from various domains, including news wires, patents, poems, idioms, etc. It is difficult for a single translation system to achieve high accuracy on all domains. Therefore, hybrid translation is practically needed. Generally, we build a statistical machine translation (SMT) system, using the training corpora automatically crawled from the web. For the translation of idioms (e.g. “有 志者,事竟成,where there is a will, there is a way”), hot words/expressions (e.g. “一带一路, One Belt and One Road ”), example-based translation methods are used. To improve the translation of date (e.g. “2012年7月6日, July 6, 2012”), numbers (e.g. “三千五百万, thirty-five million), etc, rule-based methods are used as pre-process. To improve translation quality for the resourcepoor language pairs, we used pivot-based methods. Wu and Wang (2007) proposed the triangulation method that combines the source-pivot and the pivot-target phrase tables to induce a sourcetarget phrase table. To fill up the data gap between the source-pivot and pivot-target corpora, Wu and Wang (2009) employed a hybrid method combining RBMT and SMT systems. We also proposed a method to use a Markov random walk to discover implicit relations between phrases in the source and target languages (Zhu et al., 2013), thus to improve the coverage of phrase pairs. We utilized the co-occurrence frequency of source-target phrase pairs to estimate phrase translation probabilities (Zhu et al., 2014). On May 20th this year, we have launched a neural machine translation (NMT) system for Chinese-English translation. The system conducts end-to-end translation with a source language encoder and a target language decoder. Both the encoder and decoder are recurrent neural networks. The strength of NMT lies in that it can learn semantic and structural translation information by taking global contexts into account. We further integrated the SMT and NMT system to improve translation quality. We also released off-line translation packs for NMT system on mobile devices, providing translation services in case that the Internet is unavailable. So far as we know, this is the first NMT system supporting off-line translation on mobile devices. We also investigate the problem of learning a machine translation model that can simultaneously translate sentences from one source language to multiple target languages (Dong et al., 2015). Our solution is inspired by the recently proposed neural machine translation model which generalizes machine translation as a sequence learning problem. We train a unified neural machine translation model under the multi-task learning framework where the encoder is shared across different language pairs and each target language has a separate decoder. This model gets faster and better convergence for both resource-rich and resourcepoor language pairs under the multi-task learning framework. Based on the above techniques, we have released translation products for multiple platforms, including web translation on PC, APP on mobile devices, as well as free API for the thirdparty developers. Our system now support translations among 27 languages, not only including many frequently-used foreign languages, but also
Highlights
OverviewI would like to introduce the research and products of machine translation in Baidu
Zhongjun He is a senior researcher in machine translation at Baidu
He received his Ph.D. in 2008 from Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS). He has more than ten years of research and development experiences on statistical machine translation
Summary
I would like to introduce the research and products of machine translation in Baidu. As the biggest Chinese search engine, Baidu has released its machine translation system in June, 2011 It supports translations among 27 languages on multiple platforms, including PC, mobile devices, etc. To improve translation quality for the resourcepoor language pairs, we used pivot-based methods. We train a unified neural machine translation model under the multi-task learning framework where the encoder is shared across different language pairs and each target language has a separate decoder. This model gets faster and better convergence for both resource-rich and resourcepoor language pairs under the multi-task learning framework. Rich information related to the food will be displayed, including the materials, the taste, etc
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.