Abstract

Early detection is essential for cervical cancer therapy, which is the fourth most frequent malignancy worldwide. While the Pap smear test is the established approach for identifying cervical cancer, its reliability relies on the proficiency of healthcare professionals. Computer-aided diagnosis (CADx) systems utilize deep learning and medical image analysis to improve the accuracy and speed of diagnoses. Nonetheless, the utilization of these systems faces obstacles such as insufficient data, variations in images, and issues related to image quality. This article presents an advanced architectural framework, the Multi-Axis Vision Transformer (MaxViT), designed to address challenges. Adapting MaxViT for Pap smear data yields a lightweight structure, offering superior accuracy and inference speed. To improve our proposed model's performance, we substituted MBConv blocks in the MaxViT architecture with ConvNeXtv2 blocks and MLP blocks with GRN-based MLPs. This modification not only reduced parameter counts but also enhanced the model's generalization capabilities. The proposed method underwent evaluation using the publicly available SIPaKMeD and Mendeley LBC, pap smear datasets, employing a total of 106 deep learning models, 53 CNNs and 53 vision transformer models for each dataset. In comparison with experimental and state-of-the-art methods, the proposed method demonstrated notable accuracy, surpassing existing literature and all deep learning models, achieving 99.02 % accuracy on the SIPaKMeD dataset and 99.48 % on the LBC dataset. This study stands out as the most extensive and comprehensive effort, employing 106 deep learning models for diagnosing cervical cancer through pap smear images.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call