In this paper, we propose a novel architecture for defect detection in electroluminescent images of polycrystalline silicon solar cells, addressing the challenges posed by subtle and dispersed defects. Our model, based on a modified Swin Transformer, incorporates key innovations that enhance feature extraction and fusion. We replace the conventional self-attention mechanism with a novel group self-attention mechanism, increasing the mAP50:5:95 score from 50.12 % to 52.98 % while reducing inference time from 74 ms to 62 ms. We also introduce a spatial displacement with shift convolution module, replacing the traditional Multi-Layer Perceptron, which further enhances the model's receptive field and improves precision and recall. Additionally, our fast multi-scale feature fusion mechanism effectively combines high-resolution details with high-level semantic features from different network layers, optimizing defect detection accuracy. Experimental results on the PVEL-AD dataset demonstrate that our model achieves the highest mAP50 score of 83.11 % and an F1-Score of 84.33 %, surpassing state-of-the-art models while maintaining a competitive inference time of 66.3 ms. These findings highlight the effectiveness of our innovations in improving defect detection accuracy and computational efficiency, making our model a robust solution for quality assurance in solar cell manufacturing.