Abstract

Attention mechanisms have been incorporated into many neural network-based natural language processing (NLP) models. They enhance the ability of these models to learn and reason with long input texts. A critical part of such mechanisms is the computation of attention similarity scores between two elements of the texts using a similarity score function. Given that these models have different architectures, it is difficult to comparatively evaluate the effectiveness of different similarity score functions. In this paper, we proposed a baseline model that captures the common components of recurrent neural network-based Question Answering (QA) systems found in the literature. By isolating the attention function, this baseline model allows us to study the effects of different similarity score functions on the performance of such systems. Experimental results show that a trilinear function produced the best results among the commonly used functions. Based on these insights, a new T-trilinear similarity function is proposed which achieved the higher predictive EM and F1 scores than these existing functions. A heatmap visualization of the attention score matrix explains why this T-trilinear function is effective.

Highlights

  • In natural language processing (NLP), handling long sequences of words is a crucial and challenging task

  • Remember that the output of the attention mechanism is passed into a recurrent neural network

  • The proposed T-trilinear function shows a similar trend to feedforward neural network (FNN) and concat-FNN in the initial stages

Read more

Summary

Introduction

In natural language processing (NLP), handling long sequences of words is a crucial and challenging task. Attention mechanisms are used to circumvent this problem by focusing on the information that is most relevant to the target Their success has made them an indispensable part of neural network-based NLP models for Auckland, New Zealand machine translation [4, 5], machine reading comprehension [6,7,8,9], sentiment analysis [10, 11], and question answering (QA) [12, 13]. We can say that the words in the question pay attention to different parts of the passage that are considered most relevant This method has been used successfully in many Recurrent Neural Network (RNN)-based QA systems [6, 7, 14,15,16]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.