Fermentation is a key process in forming the flavor quality of black tea. Evaluating the degree of fermentation during black tea processing is difficult. This paper proposes an improved 3D-SwinT-CNN network for the end-to-end processing of black tea’s hyperspectral images to evaluate the degree of fermentation. The model incorporates dilated convolution and a shifted window self-attention mechanism, expanding the network’s receptive field and capturing both local spatial-spectral and global features of black tea hyperspectral images. The accuracy is 98.13% on the test set. Compared to manual feature extraction methods, the accuracy improved by 3.45%. Compared to baseline 3D-CNNs, 3D-SwinT-CNN demonstrates superior spatial-spectral feature extraction, achieving an average accuracy improvement of 10.17%. Ablation experiments were conducted to further verify the effectiveness of the introduced modules. The proposed 3D-SwinT-CNN aims to establish a research foundation for online evaluation of black tea's fermentation degree.