Abstract

For unbalanced data classification, RF (Random forest) algorithm will cause problems such as poor classification performance and a large DT scale. With the advent of the era of big data, RF algorithms should have the ability to process large-scale data. Aiming at the problem that RF cannot handle unbalanced data well, this paper improves the feature selection method built in RF and proposes a new feature selection algorithm. On the basis of feature importance ranking, randomness is introduced to ensure the strength of each tree and reduce the correlation between trees. In the extended transform data set, the sensitivity of the RF model has exceeded 0.8, and that of other models has increased to about 0.65. The prediction accuracy of the centralized RF model for the company’s credit rating reached 100%, while the CART model misjudged companies C6 and C7, while the Logit model misjudged companies C3, C5, and C8. Experiments prove the extrapolation of the RF model and its excellent prediction ability. In the practical application of applied mathematics specialty, the RF optimization algorithm proposed in this study can well handle continuous variables and improve the classification accuracy of RF. This paper holds that the advantages of the RF algorithm in data processing and model performance will make it more widely used in the field of enterprise credit risk evaluation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call