A principle component analysis-based random forest with the potential nearest neighbor method for automobile insurance fraud identification

Yaqi Li,Chun Yan,Wei Liu,Maozhen Li

doi:10.1016/j.asoc.2017.07.027

Abstract

As a successful ensemble method, Random Forest has attracted much attention. In this paper, individual classifiers are appropriately combined and a multiple classifier system with an increase in classification accuracy is presented. According to Breiman’s methodology, we propose a multiple classifier system based on the Random Forest, Principle Component Analysis and Potential Nearest Neighbor methods As Breiman suggested, the performance of the Random Forest depends on the strength of the weak learners in the forests and diversity among them. The Principle Component Analysis method is applied to transform data at each node to another space when computing the best split at this node. This process increases the diversity of each tree in the forest and thereby improves the overall accuracy. The Random Forest is studied through the perspective of the Adaptive Nearest Neighbor. We introduce the concept of monotone distance measures and potential nearest neighbors and show that the Random Forest can be viewed as an adaptive learning mechanism of k Potential Nearest Neighbors. Considering the information loss caused by out-of-bag samples, a new voting mechanism based on Potential Nearest Neighbor is also presented to replace the traditional majority vote. The proposed algorithm improves the classification accuracy of the ensemble classifier by improving the difference of the base classifiers. The performance of the proposed method is compared with those of the Oblique Decision Tree Ensemble, Rotation Forest and basic Random Forest on the data sets. The experimental results show that the proposed method produces a better classification accuracy and lower variance. The proposed method is also applied to detect automobile insurance fraud, and the fraud rules are obtained.

Full Text