Abstract

About half a million women in the world are affected by cervical cancer and about 0.3 million deaths occur per year due to cervical cancer. Cytologists perform Pap-smear tests to screen the Pap Smear images of the cervical cells. This manual screening is prone also to error. Therefore, an automated computer-aided detection systems have been proposed for the classification of cervical cancer cell images. In the proposed work, an ensemble of Vision Transformer network (ViT) and convolution neural network (CNN) has been proposed for the classification of cervical cell Pap smear images. ViT has been known for its minimal inductive bias and its competitive classification performance in comparison to the state-of-the-art convolution neural network. Fine-tuning large ViT network is a computationally intensive procedure; therefore, as an alternative to ViT-CNN approach, another transfer learning-based approach has also been proposed in which the features extracted from the pre-trained CNNs are combined and classified with the resource-efficient Long Short Term Memory (LSTM) network. Comparison between both the approaches has been made on the basis of their classification performance, test time, generalization ability and attention maps. Experimental results show that the ViT-CNN ensemble approach achieved 97.65% classification accuracy whereas the LSTM-based approach achieved 95.80% classification accuracy. ViT-CNN ensemble approach achieves better classification accuracy at the cost of the huge demand for computation since it takes more computational resources in terms of high amount of random access memory (RAM) in the graphical processing unit (GPU); whereas, the CNN-LSTM approach is less accurate and computationally cheaper.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call