The integration of Spiking Neural Networks (Maass, 1997) and Transformers (Vaswani et al., 2017) has significantly enhanced performance in the field, achieving substantial improvements while ensuring energy efficiency. This amalgamation bears significant exploratory significance. This paper aims to enhance the network’s capacity for extracting temporal information from neuromorphic datasets. By leveraging the Spikformer (Zhou et al., 2023b) architecture, we introduce a novel network named TE-Spikformer. Through thorough analysis, we identified an imbalance in the Average Firing Rates(AFR) of neurons in the layer preceding the classification head across temporal steps. We thoroughly investigate the root cause of this issue, attributing it to the limitations of the network’s use of Batch Normalization (BN) (Ioffe and Szegedy, 2015) layer. To address these challenges, we propose a Batch Group Normalization (BGN) layer. While maintaining the stability of temporal characteristics in the data, we also introduce the Spike Spatio-Temporal Attention(SSTA) module to enhance the network’s ability to capture temporal information. To validate the effectiveness of our approach, we conducted multiple experiments using neuromorphic datasets, including DVS-CIFAR10, DVS128 Gesture, and N-Caltech101. The experimental results demonstrate that our algorithm consistently outperforms baseline methods, achieving accuracy rates of 99.30%, 89.60%, and 87.42%, respectively, thereby attaining state-of-the-art performance in the field.