Abstract

Class imbalanced medical datasets, such as cancer prediction, contain imbalanced numbers of data in different classes leading to skewed class distribution, which makes it very difficult for a classifier to distinguish between minority (i.e. cancer) and majority (i.e. non-cancer) classes. Related studies in the literature have proposed different types of solutions for the class imbalance problem including data level, algorithmic level, and cost-sensitive learning approaches. However, none of these potential solutions have considered the issue of missing attribute values residing in the class imbalanced medical datasets, especially for the minority class. Missing value imputation is commonly used for the construction of some models where statistical or machine learning techniques are used to produce estimations to replace the missing values. However, the existing imputation methods require a certain number of observed data to produce their estimations, the major challenge for them being that the amount of observed data (with no missing values) in the minority class is very limited, or that some data are not complete. In this paper, we proposed a novel approach, namely Dynamic Time Warping-based Imputation (DTWI), to handle class imbalanced datasets with missing values. Based on the similarity measurement technique of DTW, all of the data (with or without missing values) in the minority class can be used for missing value imputation. The experimental results based on 10 different class imbalanced medical datasets show that when the missing rates in the minority classes are smaller than 30%, DTWI performs similarly to the baseline K-NN imputation method and better than the mean/mode imputation and case deletion methods. When the missing rates are larger than 30%, DTWI significantly outperform the other techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call