The accurate detection and localization of polyps during endoscopic examinations are critical for early disease diagnosis and cancer prevention. However, the presence of artifacts and noise, along with the high similarity between polyps and surrounding tissues in color, shape, and texture complicates polyp detection in video frames. To tackle these challenges, we deployed multivariate regression analysis to refine the model and introduced a Noise-Suppressing Perception Network (NSPNet) designed for enhanced performance. NSPNet leverages wavelet transform to enhance the model’s resistance to noise and artifacts while improving a multi-frame collaborative detection strategy for dynamic polyp detection in endoscopic videos, efficiently utilizing temporal information to strengthen features across frames. Specifically, we designed a High-Low Frequency Feature Fusion (HFLF) framework, which allows the model to capture high-frequency details more effectively. Additionally, we introduced an improved STFT-LSTM Polyp Detection (SLPD) module that utilizes temporal information from video sequences to enhance feature fusion in dynamic environments. Lastly, we integrated an Image Augmentation Polyp Detection (IAPD) module to improve performance on unseen data through preprocessing enhancement strategies. Extensive experiments demonstrate that NSPNet outperforms nine SOTA methods across four datasets on key performance metrics, including F1Score and recall.
Read full abstract