Abstract
Self-attention networks (SANs) have attracted an amount of research attention for their outstanding performance under the machine translation community. Recent studies proved that SANs can be further improved by exploiting different inductive biases, each of which guides SANs to learn a specific view of the input sentence, e.g., short-term dependencies, forward and backward views, as well as phrasal patterns. However, less studies investigate how these inductive techniques complementarily improve the capability of SANs and this would be an interesting question to be answered. In this paper we selected five inductive biases which are simple and not over parameterized to investigate their complementarily. We further propose multi-view self-attention networks, which jointly learn different linguistic aspects of the input sentence under a unified framework. Specifically, we propose and exploit a variety of inductive biases to regularize the conventional attention distribution. Different views are then aggregated by a hybrid attention mechanism to quantify and leverage the specific views and their associated representation conveniently. Experiments on various translation tasks demonstrate that different views are able to progressively improve the performance of SANs, and the proposed approach outperforms both the strong Transformer baseline and related models on Transformer-base and Transformer-big settings. Extensive analyses on 10 linguistic probing tasks verify that different views indeed tend to extract distinct linguistic features and our method gives highly effective improvements in their integration.11We release the code of this work in https://github.com/NLP2CT/MV-Attn.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.