Abstract

Thousands of patients around the world affecting their health with various factor as age, body mass index, cholesterol levels, albumin levels and several other factor. Prediction of health outcome due to these factors at a proper time can be served as an early warning. Recent growth in machine learning algorithm inspired us to build a predictive model for better healthcare facilities. In our work we have focused on problem of noisy and imbalanced dataset in which majority class is favored over minority one that leads to false prediction. We have experimented with two publicly available medical imbalanced dataset which varies in its size as MIT’s GOSSIS death and PIMA Indians Diabetes Dataset based on binary class. In this model we have investigated 3 oversampling techniques (Synthetic Minority Oversampler, Random Oversampler and Adaptive Synthetic Sampler) along with two undersampling techniques (Random Undersampler and Near Miss) which were paired with 3 data reduction and cleaning methods namely Tomek Links, One Sided Selection and Edited Nearest Neighbors. At last, we found that combination of Adaptive Synthetic Sampler along with One Sided Selection perform better in case of large size dataset while combination of random oversampler along with Tomek Link showed better performance in case of low size data dataset. We have also analyzed that oversampling technique gives quite promising results in comparison to undersampling methods specifically when applied with machine learning classifiers as these classifiers are data hungry algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call