Learning from imbalanced data: A comprehensive comparison of classifier performance for bleeding detection in endoscopic video

Farah Deeba,Francis M. Bui,Shahed K. Mohammed,Khan A. Wahid

doi:10.1109/iciev.2016.7760150

Abstract

Imbalanced data is an inevitable problem in many real world problems, including bleeding detection from endoscopic videos with a fewer clinically significant examples outnumbered by normal examples. In this paper, we have presented a comprehensive analysis of six different classifier performance for different class distribution of training dataset. We have addressed two questions: 1. Is there any advantage of using a certain classifier over others? 2. For bleeding detection problem, what is the optimal range of class distribution in training data set? We have built seven different training sets with different class distributions to answer the above questions. Besides the standard performance metrics, we have defined a metric to measure the robustness of the classifiers to get the optimal range of class distribution for a certain classifier. From our experiments, we found that balanced training set yields the best performance for all classifiers. Ensemble classifiers are more robust to the variation in training dataset compared to single classifier.

Full Text