One widely utilized attack technique for side-channel attacks is deep learning. However, the efficiency of deep learning attacks can be impacted by the differences in data collected from various devices or environmental conditions. These differences arise from factors such as hardware discrepancies, environmental noise, and countermeasures. Such differences can disrupt the correlation between the leaked information from the target device during operations and sensitive data, leading to failures in the attacks. In order to better extract the feature information and temporal information of traces in this challenge, we propose an Attention Mechanism and Multi-Scale Convolutional Neural Network (AMCNNet). The critical components of the AMCNNet are the multi-scale convolution module and the feature extraction module. The multi-scale convolution module extracts feature of different scales from traces, improving the network's ability to capture high-level meanings and details. Then, the feature extraction module dynamically weights and rescales the features to better captures the long-term dependencies in the traces.. Through experiments conducted on publicly available datasets such as DPA Contest v4, ASCAD, and AES_HD, as well as analyzing the guessing entropy, the experimental results demonstrate that the network exhibits strong generalization capability and robustness across datasets with diverse features.