Abstract
The natural language processing (NLP) field has made significant progress using deep learning models based on multi-head attention mechanisms, such as Transformer and BERT. However, there are two major limitations to this approach. First, the number of heads is often manually set based on empirical experience, and second, it is not clear enough in semantic understanding and interpretation. In this study, we propose a novel attention mechanism called Factor Analysis-based Multi-head (FAM) Attention, which combines the theory of explorative factor analysis and word embedding. The experimental results demonstrate that FAM Attention achieves better performance and requires fewer parameters compared to traditional methods while also having better semantic understanding ability and interpretability at the token level. This also has significant implications for current Large Language Models (LLMs), particularly in terms of effectively reducing parameter counts and enhancing performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have