POSITIONAL ENCODING FOR TRANSFORMERS

Kateryna Antipova,Hlib Horban

doi:10.30525/978-9934-26-436-8-1

Abstract

The attention mechanism is a powerful and effective method utilized in natural language processing. This mechanism allows the model to focus on important parts of the input sequence. Transformer model utilizes attention mechanisms to replace recurrent and convolutional neural networks, which eliminates the need for increasingly complex operations as the distance between words in a sequence increases. However, this method is notably insensitive to positional information. Positional encoding is crucial for Transformer-like models that heavily rely on the attention mechanism. To make the models position-aware, the position information of the input words is typically incorporated to the input token embeddings as an additional embedding. The purpose of the paper is to conduct a systematic study to understand different position encoding methods. We briefly describe the components of the attention mechanism, its role in the Transformer model, and the encoder-decoder architecture of the Transformer. We also study how sharing position encodings across various heads and layers of a Transformer affects the model performance. Methodology of the study is based on general research methods of analysis and synthesis, experimental testing, and quantitative analysis to comprehensively examine and compare the efficacy and performance of different positional encoding techniques utilized in Transformer models. The obtained results show that using absolute and relative encodings results in similar performance for the model, while relative encodings worked much better with longer sentences. We found the original encoder-decoder form worked best for the tasks of machine translation and question answering. Despite using twice as many parameters as "encoder-only" or "decoder-only" architectures, an encoder-decoder model has a similar computational cost. Besides that, the number of learnable parameters can often be reduced without performance loss. Practical implications.Positional encoding is essential for enabling Transformer models to effectively process data by preserving sequence order, handling variable-length sequences, and improving generalization. Its inclusion significantly contributes to the success of Transformer-based architectures in various natural language processing tasks. Value/originality.Positional encoding is such a critical issue for Transformer-like models. However, it has not been explored how positional encoding establishes positional dependencies within a sequence. We chose to analyze several approaches to position encoding in the context of question answering and machine translation tasks because the influence of positional encoding on NLP models in terms of word order remains ambiguous and requires further exploration.

Full Text