Abstract

Random forest algorithm is a flexible and easy-to-use machine learning algorithm, which is widely used in classification problems. However, the traditional random forest has some limitations. Because the randomness added by random forest to decision trees almost only occurs in the feature selection when the decision trees are generated, the fixity of decision trees generation rules will lead to relatively serious over fitting. At the same time, in the face of data with high and unbalanced feature dimensions, the performance of algorithm is seriously weakened because high-dimensional data usually contains many irrelevant and redundant features. To solve these problems, we propose an improved random forest algorithm FSRF. Based on the traditional random forest algorithm, we use the feature selection methods to preprocess the data and get the feature subset with the best classification performance to construct the random forest. At the same time, we introduce sparse matrix projection to improve the generation of the random forest. Experiments show that our method reduces the influence of redundant features on classification and improves the accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call