Abstract

Images are increasingly used for AI-based diagnosis and analysis of many diseases like cervical cancer, mouth cancer, glucose analysis from retina etc. In many cases, data collection is done by specialised camera modules which capture images of affected areas. As with any other sources of data, this process is also error-prone and may contain unwanted objects and regions that may require cleaning by removing them. Outliers in these kinds of dataset may adversely affect the performance of machine learning models. Manually cleaning would be a tedious task, especially when the data is collated from different sources. Hence, cleaning the data before training the model is of utmost importance. In this paper, we propose a Few-Shot learning based model pre-trained in supervised contrastive learning settings to automate the process of data cleaning. Our model learns the dataset distribution and distinguishes the accurate data points from noisy data points. We also show that scaling up the model can greatly improve the Few-Shot performance. On the noisy MobileODT cervical data, which was collected from Kaggle, our model obtained 52% accuracy without cleaning data using an EfficientNet architecture for the classification task. Whereas the same architecture with ROI cropping achieved an accuracy of 76.56% after cleaning through the proposed Deep Cleaner approach that requires only 100 clean images. The proposed approach performs 2.74% better than a denoising auto-encoder, which is considered a powerful anomaly detection technique.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call