Long-distance oil and gas pipeline has become the main mode of transporting oil and gas resources as the demand for fossil energy increases. The complex environment of long-distance pipeline makes the pipeline susceptible to surface defects, which pose a major safety threat. Magnetic flux leakage (MFL) detection is a non-destructive testing (NDT) method that can be used to map and monitor pipeline defects. However, the accuracy and reliability of MFL detection can be severely compromised by the data processing and analytics methods used. This paper presents a novel hybrid method for detecting pipeline defects by combining well-established computer vision algorithms, i.e., YOLOv5 and Vision Transformer (ViT), enabling accurate detection and classification of pipeline defects simultaneously. An in-house laboratory dataset generated by Pull Through Testing (PTT) was used for model training and verification, which contains the manipulated pipeline defects, e.g., metal corrosion, geometrical defect, metal material loss, and geometrical structures etc. The effect of different model architectures on “the defect identification performance was investigated. Comparisons show that the proposed cascaded approach surpasses the YOLOv5 algorithm alone in pipeline defect classification accuracy, while still preserving the versatile ability of defect detection with high precision.