Abstract

A dramatic change in the abilities of language models to provide state of the artaccuracy in a number of Natural Language Processing tasks is currently observed. These improvements open a lot of possibilities in solving NLP downstream tasks. Such tasks include machine translation, speech recognition, information retrieval, sentiment analysis, summarization, question answering, multilingual dialogue systems development and many more. Language models are one of the most important components in solving each of the mentioned tasks. This paper is devoted to research and analysis of the most adopted techniques and designs for building and training language models that show a state of the art results. Techniques and components applied in creation of language models and its parts are observed in this paper, paying attention to neural networks, embedding mechanisms, bidirectionality, encoder anddecoder architecture, attention and self-attention, as well as parallelization through using Transformer. Results: the most promising techniques imply pre-training and fine-tuning of a language model, attention-based neural network as a part of model design, and a complex ensemble of multidimensional embeddings to build deep context understanding. The latest offered architectures based on these approaches require a lot of computational power for training language model and it is a direction of further improvement.Ref. 49, pic. 13

Highlights

  • Natural Language Processing (NLP) is computer comprehension, analysis, manipulation, and generation of natural language

  • Decoder takes the output of Encoder and uses it to generate output predictions

  • Architecture based on Transformers advances even further and shows a state-of-the-art results on most of the NLP tasks

Read more

Summary

Introduction

Natural Language Processing (NLP) is computer comprehension, analysis, manipulation, and generation of natural language. The highest goal for all scientists working in the area of Natural Language Processing is to build such techniques that will allow computers to comprehend natural language as text or voice at a human level which is not reached yet. There have been a lot of discoveries of language models from computational linguistics scientists, and those new techniques showed great results on specific tasks. When it comes to the broad spectrum of NLP tasks solved by the same language model not many show the same high results. The structure of this article includes language models architecture observation in section 2; Word and contextual embedding techniques are described in section 3; Encoder-Decoder – section 4; Neural Networks applied as a part of Encoder and.

The most common approach to NLP downstream tasks
Encoder Vector
Word and contextual embeddings approaches
Hidden State
Update gate
Output gate
Feed Forward
Previous Decoder Outputs
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call