Abstract
Due to the rise of many fields such as e-commerce platforms, a large number of stream data has emerged. The incomplete labeling problem and concept drift problem of these data pose a huge challenge to the existing stream data classification methods. In this respect, a dynamic stream data classification algorithm is proposed for the stream data. For the incomplete labeling problem, this method introduces randomization and iterative strategy based on the very fast decision tree VFDT algorithm to design an iterative integration algorithm, and the algorithm uses the previous model classification result as the next model input and implements the voting mechanism for new data classification. At the same time, the window mechanism is used to store data and calculate the data distribution characteristics in the window, then, combined with the calculated result and the predicted amount of data to adjust the size of the sliding window. Experiments show the superiority of the algorithm in classification accuracy. The aim of the study is to compare different algorithms to evaluate whether classification model adapts to the current data environment.
Highlights
With the development of the Internet, sensors, and the Internet of ings, massive streaming data has emerged
Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using stream processing techniques without having access to all of the data. It includes a variety of data formats, such as log files generated by web applications, online shopping data, traffic monitoring, social networking site information, and geospatial and meteorological satellite data. ese stream data imply a large amount of information that is instructive for real-world decision-making. erefore, many scholars analyze the data to obtain useful information which guides people to make scientific decisions, such as e-commerce platform personalized real-time recommendations, stock market monitoring, network intrusion, abnormal fraud monitoring, most smart device applications, traffic monitoring, and real-time motion analysis, etc
Aiming at the above problems, this paper proposes a real-time streaming data dynamic classification algorithm that adopts decision tree as the base classification model and combines the idea of an integrated classification algorithm to change the mode of a single classification model and updates the classification model periodically; in the detection of concept drift, the degree of conceptual drift is detected by calculating the difference between the data of front and end part in the sliding window
Summary
With the development of the Internet, sensors, and the Internet of ings, massive streaming data has emerged. As an important branch of data mining, the classification problem has important practical significance in the fields of financial credit rating, prevention of telecom fraud, and detection of network intrusion [1]. Some existing data mining schemes and algorithms fail to fully consider the characteristics of stream data and practical application scenarios, such as concept drift, incomplete labeling, and uneven data flow rate. Aiming at the above problems, this paper proposes a real-time streaming data dynamic classification algorithm that adopts decision tree as the base classification model and combines the idea of an integrated classification algorithm to change the mode of a single classification model and updates the classification model periodically; in the detection of concept drift, the degree of conceptual drift is detected by calculating the difference between the data of front and end part in the sliding window.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.