Abstract

Neural machine translation (NMT) systems trained on clean data usually suffer from performance degradation when translating noisy inputs. Existing works attempt to improve the robustness of NMT normally via data augmentation, where synthetic noisy data are mixed with original clean data, either for training NMT with the standard NMT loss alone, or for tuning auxiliary tasks in a multi-task learning manner. Typical auxiliary tasks include detecting and correcting noises, exploiting noisy outputs for contrastive learning etc. The aforementioned two auxiliary tasks are generally designed independently, and the modules for detecting and correcting noises are heavyweight. In this article, we propose a new framework, DetTransNet ( <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Det</b> ector- <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Trans</b> lator <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Net</b> work), aiming to detect positions of noises in the input and translate the input simultaneously. The newly introduced noise detector module is essentially a lightweight binary classifier built upon the final layer of the encoder of the original Transformer model for the translation task, which is to identify at which position of the input has potential noise. The module has a very few parameters. In order to help the model capture the relationship between clean instances and their noisy counterparts, an extra loss is further introduced to enhance the interaction between clean and noisy data. In this way, we combine noise detection and contrastive learning together. As the model is able to identify and locate noises, a heuristic method is proposed to correct detected noises, in order to achieve better translations. Experiments show that DetTransNet is robust to four types of noises (deletion, insertion, swapping, keyboard), and obtain a substantial improvement of up to 1.6 BLEU points across different datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call