Interpretability analysis in transformers based on attention visualization

Yuxi Guo

doi:10.54254/2755-2721/76/20240571

Abstract

Self-attention is the core idea of the transformer, a kind of special structure for models to understand sentences and texts. Transformer is growing fast, but the model's internal unknowns are still out of control. In this work, the research visualizes self-attention and observes those self-attentions in some transformers. Through observation, there are five types of self-attention connections. The research classifies them as Parallel self-attention head, Radioactive self-attention head, Homogeneous self-attention head, X-type self-attention head, and Compound self-attention head. The Parallel self-attention head is the most important. The combination of different types will affect the performance of the transformer. Visualizations can indicate the location of different types. The results show that some homogeneous heads should be more varied in that case the model will perform better. A new training method is called local head training method, and the local training method may be useful during training transformer. The purpose of this study is to lay the foundation for model biology, to take other perspectives to understand transformers, and to fine-tune training methods.

Full Text