Few-shot learning aims to classify novel data categories with limited labeled samples. Although metric-based meta-learning has shown better generalization ability as a few-shot classification method, it still faces challenges in handling data noise and maintaining inter-sample distance stability. To address these issues, our study proposes an innovative few-shot learning approach to enhance image features' global and local semantic representation. Initially, our method employs a multiscale residual module to facilitate extracting multi-granularity features within images. Subsequently, it optimizes the fusion of local and global features using the self- attention mechanism inherent in the Transformer module. Additionally, a weighted metric module is integrated to improve the model's resilience against noise interference. Empirical evaluations on CIFAR-FS and MiniImageNet few-shot datasets using 5-way 1-shot and 5-way 5-shot scenarios demonstrate the effectiveness of our approach in capturing multi-level and multi-granularity image representations. Compared to other methods, our method improves accuracy by 2.63% and 1.27% for 5-shot scenes on these two datasets. The experimental results validate the efficacy of our model in significantly enhancing few-shot image classification performance.