Abstract

Data mining is the process of extracting useful information from the vast and complex databases. In real time scenario the data sources contain many varied data including imbalance data category. Imbalance data sets contain more percentage of instances from one class and are very less percentage of instances from other class. The traditional decision tree algorithm called Iterative Dichotomiser 3 (ID3) is built for not handling the imbalance datasets. To overcome the drawback of ID3 on imbalance datasets, an improved algorithms are needed. In this paper, propose extension of ID3 algorithm called Over Sampled ID3 (OSID3) for imbalance data learning. The proposed OSID3 approach uses the oversampling technique with unique statistical oversample strategy for removing less privileged instances in the early stage and later on oversampling the high privileged instances for approximate data balance. The experimental observation suggests that the proposed approach improves in terms of Accuracy, Area Under Curve (AUC) and Root Mean Square Error (RMSE) with the benchmark ID3 on 15 imbalance datasets from University of California, Irvine (UCI) repository.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call