Abstract

Big data analysis in the form of streaming computation has always been a problem to be solved at present, with relatively few research results and practical experience. The random forest method is currently the most extensively used classification algorithm. However, in the application scenario of streaming computation, real-time, volatility, and disorder features presented by data will lead to a gradual reduction in the accuracy of the algorithm. In this paper, the characteristics of the random forest algorithm are analyzed, and the idea of random forest pruning based on the accuracy of the decision tree is proposed. Meanwhile, to adapt to the changes in meteorological data, the concept of accuracy interval is combined to propose a method for the generation, verification, and supplementation of a new decision tree. Finally, a random forest that can be constantly updated with data is established to meet the requirements of the streaming big data environment for the algorithm. Actual meteorological data are used to verify the feasibility of the improved method. The results show that the new method has a higher classification accuracy in the real streaming big data scenario.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.