Abstract

The adverse consequences of class imbalance problem are prevalent while performing classification for disease diagnosis. Such classifiers predict most of the examples of negative class (instances with non-diseased label) correctly but fail to make correct predictions for the positive class examples (instances with diseased label). Since misclassification costs can be very high for a sensitive field like disease diagnosis, addressing the class imbalance issue becomes of utmost important. A number of sampling techniques have been applied for balancing data. However, these techniques reduce the overall accuracy of classifier models. This is due to the fact that there is an issue of trade-off between sensitivity and specificity of such classifiers. This paper first proposes a GA-based undersampling technique with a weighted fitness function to determine the trade-off between sensitivity and specificity followed by a multi-objective genetic algorithm (MOGA) approach to address the class imbalance problem for disease diagnosis. To determine the trade-off between sensitivity and specificity manually is an arduous task. The MOGA approach takes the two extreme training samples from Pareto optimal solutions, one optimally tuned with respect to sensitivity and the other one optimally tuned with respect to specificity on validation data. Two decision tree classification models are built based on these two training sets. The models are named as sensitivity prioritized model (SEPM) and specificity prioritized model (SPPM) respectively. These models are combined to make predictions on the test data. The results obtained through extensive experimentation confirm that the proposed multi-objective scheme makes correct predictions on minority class (SE) without compromising the correct prediction rate on the majority class (SP).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call