Abstract

A classification is a problem of identifying the category (or the class) of an unknown-class observation using past historical data. One important issue in a classification is a class imbalanced problem which typically finds in a classification where the proportion of the target class is significantly smaller than others. A traditional classifier normally misclassifies an instance from this target class, called the minority class, as noise due to the small number of instances. Modification of the classification algorithm to handle a class imbalanced problem is a challenging task, especially for a random forest. In the random forest algorithm, the bootstrapping step is used to generate several subsets from a training data by random sampling uniformly with replacement. Most bootstrapping subsets may not even contain instances from the minority class which guarantee decision tree components to misclassify instances from the minority class. A random tree algorithm that needs to generate the bootstrapping subsets for each decision tree must assure the distribution of minority instances. This paper proposes a random forest algorithm using quartile-pattern bootstrapping by leveraging mass-ratio-variance outlier factor and minority condensation decision tree to handle this problem. The mass-ratio-variance outlier factor is a score assigned to each instance that will give a large value to an outlier and give a low value to instances surrounded by other instances in the same class. To evaluate the performance of this proposed algorithm, two synthesized datasets are used in the experiments. The experimental results show significant improvement when a dataset is imbalanced. The performance from the test dataset via F1 with the proposed algorithm is better than the performance from the traditional random forest algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call