A New Image Data Set and Benchmark for Cervical Dysplasia Classification Evaluation

Tao Xu,Zhiyun Xue,Edward Kim,Xiaolei Huang,Sameer Antani,Cheng Xin,L Rodney Long

doi:10.1007/978-3-319-24888-2_4

Tao Xu, Zhiyun Xue + Show 5 more

Open Access

https://doi.org/10.1007/978-3-319-24888-2_4

Copy DOI

Abstract

Cervical cancer is one of the most common types of cancer in women worldwide. Most deaths of cervical cancer occur in less developed areas of the world. In this work, we introduce a new image dataset along with ground truth diagnosis for evaluating image-based cervical disease classification algorithms. We collect a large number of cervigram images from a database provided by the US National Cancer Institute. From these images, we extract three types of complementary image features, including Pyramid histogram in L*A*B* color space PLAB, Pyramid Histogram of Oriented Gradients PHOG, and Pyramid histogram of Local Binary Patterns PLBP. PLAB captures color information, PHOG encodes edges and gradient information, and PLBP extracts texture information. Using these features, we run seven classic machine-learning algorithms to differentiate images of high-risk patient visits from those of low-risk patient visits. Extensive experiments are conducted on both balanced and imbalanced subsets of the data to compare the seven classifiers. These results can serve as a baseline for future research in cervical dysplasia classification using images. The image-based classifiers also outperform results of several other screening tests on the same datasets.

Full Text