Recently, emotion analysis has played an important role in the field of artificial intelligence, particularly in the study of speech emotion analysis, which can help understand one of the most direct ways of human emotional communication—speech. This study focuses on the emotion analysis of infant crying. Within cries lies a variety of information, including hunger, pain, and discomfort. This paper proposes an improved classification model using ResNet and transformer. It utilizes modified Mel-frequency cepstral coefficient Mel-frequency cepstral coefficient (MFCC) features obtained through feature engineering from infant cries and integrates SE attention mechanism modules into residual blocks to enhance the model’s ability to adjust channel weights. The proposed method achieved 93% accuracy rate in experiments, offering advantages of shorter training time and higher accuracy compared to other traditional models. It provides an efficient and stable solution for infant cry classification.