PurposeThe anomaly detection task for oil and gas pipelines based on acoustic signals faces issues such as background noise coverage, lack of effective features, and small sample sizes, resulting in low fault identification accuracy and slow efficiency. The purpose of this paper is to study an accurate and efficient method of pipeline anomaly detection.Design/methodology/approachFirst, to address the impact of background noise on the accuracy of anomaly signals, the adaptive multi-threshold center frequency variational mode decomposition method(AMTCF-VMD) method is used to eliminate strong noise in pipeline signals. Secondly, to address the strong data dependency and loss of local features in the Swin Transformer network, a Hybrid Pyramid ConvNet network with an Agent Attention mechanism is proposed. This compensates for the limitations of CNN’s receptive field and enhances the Swin Transformer’s global contextual feature representation capabilities. Thirdly, to address the sparsity and imbalance of anomaly samples, the SpecAugment and Scaper methods are integrated to enhance the model’s generalization ability.FindingsIn the pipeline anomaly audio and environmental datasets such as ESC-50, the AMTCF-VMD method shows more significant denoising effects compared to wavelet packet decomposition and EMD methods. Additionally, the model achieved 98.7% accuracy on the preprocessed anomaly audio dataset and 99.0% on the ESC-50 dataset.Originality/valueThis paper innovatively proposes and combines the AMTCF-VMD preprocessing method with the Agent-SwinPyramidNet model, addressing noise interference and low accuracy issues in pipeline anomaly detection, and providing strong support for oil and gas pipeline anomaly recognition tasks in high-noise environments.