Abstract

This paper takes education data mining as the research theme, mine the existing massive education big data, compares the analysis methods of existing data models, and proposes an improved random forest reference model. The information gain of various features is calculated by introducing the feature weighting system, and the evaluation index is used to improve the existing data analysis. The simulation results show that the improved model is highly efficient as compared to the existing models for classification. In order to resolve the performance bottleneck of a single node in multiple data classification tasks in the era of big data, a classification and prediction model of graduates’ large-scale employment data, based on distributed improved RF algorithm, is proposed. The MapReduce distributed computing framework is used to complete the serial writing and deserialization loading of the training model between the local disk and the distributed file system, and realizing the distributed expansion of the large-scale data classification model based on the improved RF model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call