Classification Problem Using Imbalanced Ratio Based Random Forest Method

Wan Wang,Wai Kin Victor Chan,Junlin Guo

doi:10.1088/1742-6596/1883/1/012078

Abstract

In recent decades, as machine learning is getting more and more popular in many fields, multiple classification methods have been proposed and applied to various applications for intelligent decision making. Most of the classification methods are just work on a shallow machine learning classifier. And many of these algorithms are already existed in the machine learning package. In this study, a new Imbalanced Ratio Based Random Forest(IR-RF) method was proposed to make classification and prediction. We focused on dealing with the imbalanced data. So we use several imbalanced datasets and put processed features into our IR-RF model. Three datasets we used are from the UCI machine learning database where RF and IR-RF are compared in the final results. The former RF method indicates the traditional random forest algorithm and the latter IR-RF method is our newly written one. Comparing to other classification model, IR-RF is an algorithm-level method, which is not limited by existing parameters. We can generate more new parameters within the algorithm and train our own classifier. Results show that proposed method has higher accuracy than traditional random forest method after our ratio compared and it can take better account of imbalanced condition. So proposed method not only performs well in high accuracy but also can be applied to especially imbalanced condition which is a new window for practice.

Full Text