Sparse Communication for Distributed Gradient Descent

  • Abstract
  • Highlights & Summary
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and apply them to neural machine translation and MNIST image classification tasks. Most configurations work on MNIST, whereas different configurations reduce convergence rate on the more complex translation task. Our experiments show that we can achieve up to 49% speed up on MNIST and 22% on NMT without damaging the final accuracy or BLEU.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.15398/jlm.v7i2.214
Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
  • Sep 16, 2019
  • Journal of Language Modelling
  • Saurav Jha + 2 more

Out-of-vocabulary (OOV) words can pose serious challenges for machine translation (MT) tasks, and in particular, for low-resource language (LRL) pairs, i.e., language pairs for which few or no parallel corpora exist. Our work adapts variants of seq2seq models to perform transduction of such words from Hindi to Bhojpuri (an LRL instance), learning from a set of cognate pairs built from a bilingual dictionary of Hindi - Bhojpuri words. We demonstrate that our models can be effectively used for language pairs that have limited parallel corpora; our models work at the character level to grasp phonetic and orthographic similarities across multiple types of word adaptations, whether synchronic or diachronic, loan words or cognates. We describe the training aspects of several character level NMT systems that we adapted to this task and characterize their typical errors. Our method improves BLEU score by 6.3 on the Hindi-to-Bhojpuri translation task. Further, we show that such transductions can generalize well to other languages by applying it successfully to Hindi - Bangla cognate pairs. Our work can be seen as an important step in the process of: (i) resolving the OOV words problem arising in MT tasks; (ii) creating effective parallel corpora for resource constrained languages; and (iii) leveraging the enhanced semantic knowledge captured by word-level embeddings to perform character-level tasks.

  • Book Chapter
  • 10.1007/978-3-031-30105-6_10
An Effective Ensemble Model Related to Incremental Learning in Neural Machine Translation
  • Jan 1, 2023
  • Pumeng Shi

In recent years, machine translation has made great progress with the rapid development of deep learning. However, there still exists a problem of catastrophic forgetting in the field of neural machine translation, namely, a decrease in overall performance will happen when training with new data added incrementally. Many methods related to incremental learning have been proposed to solve this problem in the tasks of computer vision, but few for machine translation. In this paper, firstly, several prevailing methods relevant to incremental learning are applied into the task of machine translation, then we proposed an ensemble model to deal with the problem of catastrophic forgetting, at last, some important and authoritative metrics are used to evaluate the model performances in our experiments. The results can prove that the incremental learning is also effective in the task of neural machine translation, and the ensemble model we put forward is also capable of improving the model performance to some extent.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 3
  • 10.18653/v1/2021.blackboxnlp-1.9
Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN
  • Jan 1, 2021
  • Rahma Chaabouni + 2 more

Despite their practical success, modern seq2seq architectures are unable to generalize systematically on several SCAN tasks. Hence, it is not clear if SCAN-style compositional generalization is useful in realistic NLP tasks. In this work, we study the benefit that such compositionality brings about to several machine translation tasks. We present several focused modifications of Transformer that greatly improve generalization capabilities on SCAN and select one that remains on par with a vanilla Transformer on a standard machine translation (MT) task. Next, we study its performance in low-resource settings and on a newly introduced distribution-shifted English-French translation task. Overall, we find that improvements of a SCAN-capable model do not directly transfer to the resource-rich MT setup. In contrast, in the low-resource setup, general modifications lead to an improvement of up to 13.1% BLEU score w.r.t. a vanilla Transformer. Similarly, an improvement of 14% in an accuracy-based metric is achieved in the introduced compositional English-French translation task. This provides experimental evidence that the compositional generalization assessed in SCAN is particularly useful in resource-starved and domain-shifted scenarios.

  • Research Article
  • Cite Count Icon 4
  • 10.3390/electronics12163391
Research on the Application of Prompt Learning Pretrained Language Model in Machine Translation Task with Reinforcement Learning
  • Aug 9, 2023
  • Electronics
  • Canjun Wang + 4 more

With the continuous advancement of deep learning technology, pretrained language models have emerged as crucial tools for natural language processing tasks. However, optimization of pretrained language models is essential for specific tasks such as machine translation. This paper presents a novel approach that integrates reinforcement learning with prompt learning to enhance the performance of pretrained language models in machine translation tasks. In our methodology, a “prompt” string is incorporated into the input of the pretrained language model, to guide the generation of an output that aligns closely with the target translation. Reinforcement learning is employed to train the model in producing optimal translation results. During this training process, the target translation is utilized as a reward signal to incentivize the model to generate an output that aligns more closely with the desired translation. Experimental results validated the effectiveness of the proposed approach. The pretrained language model trained with prompt learning and reinforcement learning exhibited superior performance compared to traditional pretrained language models in machine translation tasks. Furthermore, we observed that different prompt strategies significantly impacted the model’s performance, underscoring the importance of selecting an optimal prompt strategy tailored to the specific task. The results suggest that using techniques such as prompt learning and reinforcement learning can improve the performance of pretrained language models for tasks such as text generation and machine translation. The method proposed in this paper not only offers a fresh perspective on leveraging pretrained language models in machine translation and other related tasks but also serves as a valuable reference for further research in this domain. By combining reinforcement learning with prompt learning, researchers can explore new avenues for optimizing pretrained language models and improving their efficacy in various natural language processing tasks.

  • Research Article
  • Cite Count Icon 6
  • 10.1166/jctn.2019.8331
Neural Machine Translation: A Review of the Approaches
  • Aug 1, 2019
  • Journal of Computational and Theoretical Nanoscience
  • Kamya Eria + 1 more

Neural Machine Translation (NMT) has presented promising results in Machine translation, convincingly replacing the traditional Statistical Machine Translation (SMT). This success of NMT in machine translation tasks therefore projects to more translation tasks using NMT. This paper systematically reviews the hitherto proposed NMT systems since 2014. 86 NMT papers have been selected and reviewed. The peak of NMT systems were proposed in 2016 and the same was the case for many machine translation workshops who provided datasets for NMT tasks. Most of the proposed systems covered English, German, French and Chinese translation tasks. BLEU score accompanied by significance tests has been seen to be the best metric for NMT systems evaluation. Human judgement for fluency and adequacy is also important to support the metrics. There is still room for further improvement in translations regarding rich source translations and rare words. There is also need for extensive NMT works in other languages to maximize the apparent capabilities of NMT systems. RNN Search and Moses are basically used to develop SMT baselines for model comparisons. Results provide futuristic and directional insights into further translation tasks.

  • Book Chapter
  • 10.1007/978-3-030-32381-3_26
Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training
  • Jan 1, 2019
  • Yichao Cao + 3 more

Neural Machine Translation (NMT) has recently achieved the state-of-the-art in many machine translation tasks, but one of the challenges that NMT faces is the lack of parallel corpora, especially for low-resource language pairs. And the result is that the performance of NMT is much less effective for low-resource languages. To address this specific problem, in this paper, we describe a novel NMT model that is based on encoder-decoder architecture and relies on character-level inputs. Our proposed model employs Convolutional Neural Networks (CNN) and highway networks over character inputs, whose outputs are given to an encoder-decoder neural machine translation network. Besides, we also present two other approaches to improve the performance of the low-resource NMT system much further. First, we use language modeling implemented by denoising autoencoding to pre-train and initialize the full model. Second, we share the weights of the front few layers of two encoders between two languages to strengthen the encoding ability of the model. We demonstrate our model on two low-resource language pairs. On the IWSLT2015 English-Vietnamese translation task, our proposed model obtains improvements up to 2.5 BLEU points compared to the baseline. We also outperform the baseline approach more than 3 BLEU points on the CWMT2018 Chinese-Mongolian translation task.

  • Research Article
  • Cite Count Icon 111
  • 10.1109/tpami.2018.2876404
Neural Machine Translation with Deep Attention.
  • Oct 16, 2018
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Biao Zhang + 2 more

Deepening neural models has been proven very successful in improving the model's capacity when solving complex learning tasks, such as the machine translation task. Previous efforts on deep neural machine translation mainly focus on the encoder and the decoder, while little on the attention mechanism. However, the attention mechanism is of vital importance to induce the translation correspondence between different languages where shallow neural networks are relatively insufficient, especially when the encoder and decoder are deep. In this paper, we propose a deep attention model (DeepAtt). Based on the low-level attention information, DeepAtt is capable of automatically determining what should be passed or suppressed from the corresponding encoder layer so as to make the distributed representation appropriate for high-level attention and translation. We conduct experiments on NIST Chinese-English, WMT English-German, and WMT English-French translation tasks, where, with five attention layers, DeepAtt yields very competitive performance against the state-of-the-art results. We empirically find that with an adequate increase of attention layers, DeepAtt tends to produce more accurate attention weights. An in-depth analysis on the translation of important context words further reveals that DeepAtt significantly improves the faithfulness of system translations.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 14
  • 10.18653/v1/d19-5619
On the Importance of Word Boundaries in Character-level Neural Machine Translation
  • Jan 1, 2019
  • Duygu Ataman + 4 more

Neural Machine Translation (NMT) models generally perform translation using a fixed-size lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality. The standard approach to overcome this limitation is to segment words into subword units, typically using some external tools with arbitrary heuristics, resulting in vocabulary units not optimized for the translation task. Recent studies have shown that the same approach can be extended to perform NMT directly at the level of characters, which can deliver translation accuracy on-par with subword-based models, on the other hand, this requires relatively deeper networks. In this paper, we propose a more computationally-efficient solution for character-level NMT which implements a hierarchical decoding architecture where translations are subsequently generated at the level of words and characters. We evaluate different methods for open-vocabulary NMT in the machine translation task from English into five languages with distinct morphological typology, and show that the hierarchical decoding model can reach higher translation accuracy than the subword-level NMT model using significantly fewer parameters, while demonstrating better capacity in learning longer-distance contextual and grammatical dependencies than the standard character-level NMT model.

  • Research Article
  • Cite Count Icon 8
  • 10.1109/taslp.2021.3097939
Attending From Foresight: A Novel Attention Mechanism for Neural Machine Translation
  • Jan 1, 2021
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • Xintong Li + 5 more

Machines translation (MT) is an essential task in natural language processing or even in artificial intelligence. Statistical machine translation has been the dominant approach to MT for decades, but recently neural machine translation achieves increasing interest because of its appealing model architecture and impressive translation performance. In neural machine translation, an attention model is used to identify the aligned source words for the next target word, i.e., target foresight word, to select translation context. However, it does not make use of any information about this target foresight word at all. Previous work proposed an approach to improve the attention model by explicitly accessing this target foresight word and demonstrating substantial alignment tasks. However, this approach cannot be applied in machine translation tasks where the target foresight word is unavailable. This paper proposes several novel enhanced attention models by introducing hidden information (such as part-of-speech) of the target foresight word for the translation task. We incorporate the novel enhanced attention employing hidden information about the target foresight word into both recurrent and self-attention-based neural translation models and theoretically justify that such hidden information can make translation prediction easier. Empirical experiments on four datasets further verify that the proposed attention models deliver significant improvements in translation quality.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.1155/2021/1244389
A Study on the Intelligent Translation Model for English Incorporating Neural Network Migration Learning
  • Jan 1, 2021
  • Wireless Communications and Mobile Computing
  • Yanbo Zhang

Under the current artificial intelligence boom, machine translation is a research direction of natural language processing, which has important scientific research value and practical value. In practical applications, the variability of language, the limited capability of representing semantic information, and the scarcity of parallel corpus resources all constrain machine translation towards practicality and popularization. In this paper, we conduct deep mining of source language text data to express complex, high‐level, and abstract semantic information using an appropriate text data representation model; then, for machine translation tasks with a large amount of parallel corpus, I use the capability of annotated datasets to build a more effective migration learning‐based end‐to‐end neural network machine translation model on a supervised algorithm; then, for machine translation tasks with parallel corpus data resource‐poor language machine translation tasks, migration learning techniques are used to prevent the overfitting problem of neural networks during training and to improve the generalization ability of end‐to‐end neural network machine translation models under low‐resource conditions. Finally, for language translation tasks where the parallel corpus is extremely scarce but monolingual corpus is sufficient, the research focuses on unsupervised machine translation techniques, which will be a future research trend.

  • Research Article
  • Cite Count Icon 18
  • 10.1109/taslp.2021.3138714
Integrating Prior Translation Knowledge Into Neural Machine Translation
  • Jan 1, 2022
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • Kehai Chen + 3 more

Neural machine translation (NMT), which is an encoder-decoder joint neural language model with an attention mechanism, has achieved impressive results on various machine translation tasks in the past several years. However, the language model attribute of NMT tends to produce fluent yet sometimes unfaithful translations, which hinders the improvement of translation capacity. In response to this problem, we propose a simple and efficient method to integrate prior translation knowledge into NMT in a universal manner that is compatible with neural networks. Meanwhile, it enables NMT to consider the crossing language translation knowledge from the source-side of the training pipeline of NMT, thereby making full use of the prior translation knowledge to enhance the performance of NMT. The experimental results on two large-scale benchmark translation tasks demonstrated that our approach achieved a significant improvement over a strong baseline.

  • Research Article
  • Cite Count Icon 2
  • 10.1142/s2196888823500148
Exploring Composite Indexes for Domain Adaptation in Neural Machine Translation
  • Sep 23, 2023
  • Vietnam Journal of Computer Science
  • Nhan Vo Minh + 3 more

Domain adaptation in neural machine translation (NMT) tasks often involves working with datasets that have a different distribution from the training data. In such scenarios, k-nearest-neighbor machine translation (kNN-MT) has been shown to be effective in retrieving relevant information from large datastores. However, the high-dimensional context vectors of large neural machine translation model result in high computational costs for distance computation and storage. To address this issue, index optimization techniques have been proposed, including the use of inverted file index (IVF) and product vector quantization (PQ), called IVFPQ. In this paper, we explore the recent index techniques for efficient machine translation domain adaptation and combine multiple index structures to improve the efficiency of nearest-neighbor search in domain adaptation datasets for machine translation task. Specifically, we evaluate the effectiveness when combining optimized product quantization (OPQ) and hierarchical navigable small-world (HNSW) indexing with IVFPQ. Our study aims to provide insights into the most suitable composite index methods for efficient nearest-neighbor search in domain adaptation datasets, with a focus on improving both accuracy and speed.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 30
  • 10.18653/v1/n18-1125
Target Foresight Based Attention for Neural Machine Translation
  • Jan 1, 2018
  • Xintong Li + 4 more

In neural machine translation, an attention model is used to identify the aligned source words for a target word (target foresight word) in order to select translation context, but it does not make use of any information of this target foresight word at all. Previous work proposed an approach to improve the attention model by explicitly accessing this target foresight word and demonstrated the substantial gains in alignment task. However, this approach is useless in machine translation task on which the target foresight word is unavailable. In this paper, we propose a new attention model enhanced by the implicit information of target foresight word oriented to both alignment and translation tasks. Empirical experiments on Chinese-to-English and Japanese-to-English datasets show that the proposed attention model delivers significant improvements in terms of both alignment error rate and BLEU.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.3390/fi12060096
Multi-Source Neural Model for Machine Translation of Agglutinative Language
  • Jun 3, 2020
  • Future Internet
  • Yirong Pan + 3 more

Benefitting from the rapid development of artificial intelligence (AI) and deep learning, the machine translation task based on neural networks has achieved impressive performance in many high-resource language pairs. However, the neural machine translation (NMT) models still struggle in the translation task on agglutinative languages with complex morphology and limited resources. Inspired by the finding that utilizing the source-side linguistic knowledge can further improve the NMT performance, we propose a multi-source neural model that employs two separate encoders to encode the source word sequence and the linguistic feature sequences. Compared with the standard NMT model, we utilize an additional encoder to incorporate the linguistic features of lemma, part-of-speech (POS) tag, and morphological tag by extending the input embedding layer of the encoder. Moreover, we use a serial combination method to integrate the conditional information from the encoders with the outputs of the decoder, which aims to enhance the neural model to learn a high-quality context representation of the source sentence. Experimental results show that our approach is effective for the agglutinative language translation, which achieves the highest improvements of +2.4 BLEU points on Turkish–English translation task and +0.6 BLEU points on Uyghur–Chinese translation task.

  • Conference Article
  • 10.28995/2075-7182-2023-22-1141-1149
Pre-editing Strategy Based on Automatic Evaluation of Translation Complexity to Improve the Quality of Specialized Texts Machine Translation into English
  • Jun 19, 2023
  • Alena A Zhivotova + 1 more

The study addresses the issue of applying optimizing pre-editing of Russian-language texts in order to improve the quality of machine translation into English. A probabilistic assessment of translation task complexity is proposed to be used for selecting a pre-editing strategy. A generalized model of the translation process is presented. A mathematical model and algorithm for automated assessment of translation task complexity are proposed. Test of the model on specialized texts of oil and gas industry is described, which showed that the estimate correlates with an estimate of translation quality and can be used in selecting a strategy for optimizing pre-editing of source texts in machine translation tasks.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon