Articles published on Data stream mining
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
328 Search results
Sort by Recency
- Research Article
- 10.1186/s40537-025-01147-0
- Jun 4, 2025
- Journal of Big Data
- Waruni Hewage + 2 more
Data stream mining is a critical process utilized by organizations to derive insights from real-time data. Consequently, preserving the privacy of sensitive information while maintaining high accuracy remains a persistent challenge. Privacy-preserving data mining techniques modify data to increase privacy, a process that invariably decreases the accuracy of data mining algorithms. Though different techniques have been proposed to preserve privacy, there is a lack of well-formulated frameworks to optimize the trade-off between accuracy and privacy. This paper introduces a novel Accuracy-Privacy Optimization Framework (APOF) that allows users to define privacy requirements and predicts achievable accuracy levels, enabling fine-tuning of this balance. The logistic cumulative noise addition was used as the data perturbation method that has experimentally shown better performance and Hoeffding trees as the classifier. Additionally, a data fitting module using kernel regression is integrated, a unique approach that predicts accuracy levels based on user-defined privacy thresholds. Experimental results show that the proposed framework archives an optimal privacy level above 97% while minimising the accuracy loss across various datasets. By addressing critical gaps in privacy-preserving data mining, this study offers significant contributions to real-world applications, facilitating secure and efficient data utilization in dynamic environments.
- Research Article
- 10.54097/6m7rrt90
- Apr 29, 2025
- Frontiers in Computing and Intelligent Systems
- Min Wang
Aiming at the challenges posed by concept drift in streaming data mining, this paper proposes an Ensemble Multi-Model Voting Method for Adapting to Concept Drift (EMVM_ATCD). The method employs integrated multi-classifiers to improve model stability, uses online learning methods to update the model, and adds a dropout layer to force the model to learn different combinations to enhance generalization ability. A voting mechanism is used to process the model prediction results to enhance the ability to cope with concept drift. Experimental results show that the method achieves performance improvements ranging from 0.1% to 10% on multiple datasets, proving that it can effectively handle various types of data.
- Research Article
7
- 10.1109/tnnls.2024.3382033
- Mar 1, 2025
- IEEE transactions on neural networks and learning systems
- Jiarui Sun + 3 more
A growing number of applications generate streaming data, making data stream mining a popular research topic. Classification-based streaming algorithms require pre-training on labeled data. Manually labeling a large number of samples in the data stream is impractical and cost-prohibitive. Stream clustering algorithms rely on unsupervised learning. They have been widely studied for their ability to effectively analyze high-speed data streams without prior knowledge. Stream clustering plays a key role in data stream mining. Currently, most data stream clustering algorithms adopt the online-offline framework. In the online stage, micro-clusters are maintained, and in the offline stage, they are clustered using an algorithm similar to density-based spatial clustering of applications with noise (DBSCAN). When data streams have clusters with varying densities and ambiguous boundaries, traditional data stream clustering algorithms may be less effective. To overcome the above limitations, this article proposes a fully online stream clustering algorithm called fast boundary peeling stream clustering (FBPStream). First, FBPStream defines a decay-based kernel density estimation (KDE). It can discover clusters with varying densities and identify the evolving trend of streams well. Then, FBPStream implements an efficient boundary micro-cluster peeling technique to identify the potential core micro-clusters. Finally, FBPStream employs a parallel clustering strategy to effectively cluster core and boundary micro-clusters. The proposed algorithm is compared with ten popular algorithms on 15 data streams. Experimental results show that FBPStream is competitive with the other ten popular algorithms.
- Research Article
2
- 10.1109/tcyb.2024.3489605
- Feb 1, 2025
- IEEE transactions on cybernetics
- Hang Yu + 4 more
One challenge of learning from streaming data is that only a limited number of labeled examples are available, making semi-supervised learning (SSL) algorithms becoming an efficient tool for streaming data mining. Recently, the graph-based SSL algorithms have been proposed to improve SSL performance because the graph structure can utilize the interactivity between surrounding nodes. However, graph-based SSL algorithms have two main limitations when applied to streaming data. First, not all the labels of the data in the streaming data may be reliable, and direct classification using a graph can lead to suboptimal performance. Second, graph-based SSL algorithms assume the structure of the graph is static, but the learning environment of streaming data is dynamic. Hence, we propose a competence-aware graph neural network (CA-GNN) to deal with these two limitations. Unlike other models, CA-GNN does not directly rely on graph information that could include mislabeled nodes. Instead, a competence model is used to explore latent semantic correlations in the streaming data and capture the reliability for each data. A streaming learning strategy then evolves CA-GNN's parameters to capture the dynamism of the graph sequences. We conducted experiments using seven real datasets and four synthetic datasets, respectively, and compared the outcomes across various methods. The results demonstrate that CA-GNN classifies streaming data more effectively than the state-of-the-art (SOTA) methods.
- Research Article
- 10.1109/tcyb.2025.3605663
- Jan 1, 2025
- IEEE transactions on cybernetics
- Bin Zhang + 3 more
A frequent problem in data stream mining is concept drift, meaning the data distribution changes over time. A common issue when dealing with concept drift is insufficient data. Real-world applications of data stream mining often involve multiple data streams. However, most concept drift methods handle these data streams separately. This study uses data from other data streams to handle the problem of insufficient data. We propose a novel Multistream Concept Drift Handling Framework via data sharing, containing a fuzzy membership-based drift detection (FMDD) component and a fuzzy membership-based drift adaptation (FMDA) component, to train the new learning model for drifting streams by sharing weighted data from other nondrifting streams. A stream fuzzy set is defined with membership functions that measure the degree to which samples belong to a data stream. Our Concept Drift Handling Framework can detect when and in which stream concept drift occurs, and therefore the insufficient data issue can be solved by adding the weighted data from nondrifting streams to train new learning models. Synthetic and real-world experimental results show that our method can help avoid the insufficient data issue and thereby significantly improve the prediction performance.
- Research Article
- 10.1109/access.2025.3611957
- Jan 1, 2025
- IEEE Access
- Maria Yesenia Zavaleta-Sanchez + 4 more
A Framework for Dynamic User Modeling Integrating Data Stream Mining and Process Mining in Educational Contexts
- Research Article
- 10.1016/j.asoc.2024.112353
- Oct 18, 2024
- Applied Soft Computing
- Yousef Abdi + 2 more
An ensemble-based semi-supervised learning approach for non-stationary imbalanced data streams with label scarcity
- Research Article
1
- 10.1007/s10994-024-06621-z
- Oct 9, 2024
- Machine Learning
- Bozhidar Stevanoski + 3 more
Abstract An essential characteristic of data streams is the possibility of occurrence of concept drift, i.e., change in the distribution of the data in the stream over time. The capability to detect and adapt to changes in data stream mining methods is thus a necessity. While methods for multi-target prediction on data streams have recently appeared, they have largely remained without such capability. In this paper, we propose novel methods for change detection and adaptation in the context of incremental online learning of decision trees for multi-target regression. One of the approaches we propose is ensemble based, while the other uses the Page–Hinckley test. We perform an extensive evaluation of the proposed methods on real-world and artificial data streams and show their effectiveness. We also demonstrate their utility on a case study from spacecraft operations, where cosmic events can cause change and demand an appropriate and timely positioning of the space craft.
- Research Article
- 10.32913/mic-ict-research.v2024.n2.1249
- Sep 15, 2024
- ICT Research
- Minh-Thai Tran + 3 more
Mining valuable patterns in data streams presentsa significant challenge in the field of data mining. Thistask is crucial as it allows for the identification of highlyprofitable item sets within transaction databases. However, asnew transactions are continually added, new valuable patternsemerge, thus changing the usefulness of previously analyzeddata. It is essential to promptly update information regardingthese changes to enable effective business decision-making.Consequently, existing mining methods applied to transactionflow datasets require considerable time to identify newpatterns and update information related to new transactions.This article focuses on the research and proposal of a newtransaction stream data mining method called High-UtilityStream Linked-List Mining. The method utilizes a linkedlist structure known as the High-Utility Stream Linked List(HUSLL) to store information about patterns in the database.Mining and updating transaction information are directlyperformed on the HUSLL structure. Experimental resultsdemonstrate that this novel mining method exhibits moreefficient execution times compared to previous solutions.
- Research Article
- 10.14445/23488379/ijeee-v11i7p103
- Jul 31, 2024
- International Journal of Electrical and Electronics Engineering
- Gollanapalli V Prasad + 1 more
The dynamic structure of data streams provides major challenges for sustaining prediction model accuracy over time. Concept drift, defined as changes in underlying data distributions, has been proven to have a considerable impact on the performance of machine learning models in real-time applications. While earlier methods often focus on either slow or abrupt concept drifts, a unified framework capable of identifying both types quickly is absent. As a result, to overcome the issue mentioned above, we propose the Concept Drift Detection Framework with Hybrid Meta-Learning, abbreviated as CDDF-HML. This incandescent method applies meta-learning, adaptive feature selection and ensemble-based process to address both slow as well as sudden concept drifts. Due to this, the framework is most appropriate in dynamic data stream mining, where the underlying structure is continually changing. It showcases how it can identify deviations of ideas with further capability in accommodating various data conditions. The study also performs the comparative analysis with other techniques to demonstrate that CDDF-HML is really an effective tool for discovering concept drift. The future possibilities of CDDF-HML include the implementation of the method within specific domains, further development of granular adjustment approaches, structural and extensional amendments to scalability, and partnerships with professionals from various industries. It is beneficial in the improvement of the concept drift detection in data stream mining so that the reliability of the model can be assured in dynamic data situations.
- Research Article
2
- 10.1016/j.compeleceng.2024.109420
- Jul 14, 2024
- Computers and Electrical Engineering
- Shirin Khezri + 2 more
An experimental review of the ensemble-based data stream classification algorithms in non-stationary environments
- Research Article
- 10.1007/s40747-024-01524-x
- Jun 20, 2024
- Complex & Intelligent Systems
- Lisha Hu + 3 more
Stream data mining aims to handle the continuous and ongoing generation of data flows (e.g. weather, stock and traffic data), which often encounters concept drift as time progresses. Traditional offline algorithms struggle with learning from real-time data, making online algorithms more fitting for mining the stream data with dynamic concepts. Among families of the online learning algorithms, single pass stands out for its efficiency in processing one sample point at a time, and inspecting it only once at most. Currently, there exist online algorithms tailored for single pass over the stream data by converting the problems of classification into minimum enclosing ball. However, these methods mainly focus on expanding the ball to enclose the new data. An excessively large ball might overwrite data of the new concept, creating difficulty in triggering the model updating process. This paper proposes a new online single pass framework for stream data mining, namely Scalable Concept Drift Adaptation (SCDA), and presents three distinct online methods (SCDA-I, SCDA-II and SCDA-III) based on that framework. These methods dynamically adjust the ball by expanding or contracting when new sample points arrive, thereby effectively avoiding the issue of excessively large balls. To evaluate their performance, we conduct the experiments on 7 synthetic and 5 real-world benchmark datasets and compete with the state-of-the-arts. The experiments demonstrate the applicability and flexibility of the SCDA methods in stream data mining by comparing three aspects: predictive performance, memory usage and scalability of the ball. Among them, the SCDA-III method performs best in all these aspects.
- Research Article
- 10.31272/jae.i139.1099
- Jun 9, 2024
- Journal of Administration and Economics
- Prof Dr Husam Abulrazzak + 1 more
Algorithms and complex data analysis techniques are used in multiple fields that are expanding daily, and with it the challenges in facing multiple and more complex data types, and the directions of exploration research vary according to the diversity of these fields, and their use is increasing in the modern era in the field of artificial intelligence, which aims to facilitate human life in various fields. Mining of complex data types includes mining of time series, symbolic chains, and biological chains, in addition to mining of graphs, computer networks, mobile data, text mining, and data streams.
- Research Article
- 10.52783/jes.4235
- Jun 1, 2024
- Journal of Electrical Systems
- Anita A Parmar
With the relevant growth of big data stream, the research industry has great attention to data stream mining which has a wide range of applications like banking, education, networking, telecommunication, weather forecasting, a stock market, and so on. Because of this, privacy preserving in data stream mining is having more attention from researchers. In this paper, we mainly focus on review of privacy preserving classification methods for data streams, which applies classification algorithms to big data streams while ensuring the privacy of data. Recently, the emerging big data analytics context has conferred a new light to this exciting research area.
- Research Article
3
- 10.1016/j.ins.2024.120575
- Apr 10, 2024
- Information Sciences
- Piotr Duda + 2 more
Accelerating deep neural network learning using data stream methodology
- Research Article
3
- 10.1145/3639285
- Mar 12, 2024
- Proceedings of the ACM on Management of Data
- Xiaochen Li + 6 more
Top-k frequent items detection is a fundamental task in data stream mining. Many promising solutions are proposed to improve memory efficiency while still maintaining high accuracy for detecting the Top-k items. Despite the memory efficiency concern, the users could suffer from privacy loss if participating in the task without proper protection, since their contributed local data streams may continually leak sensitive individual information. However, most existing works solely focus on addressing either the memory-efficiency problem or the privacy concerns but seldom jointly, which cannot achieve a satisfactory tradeoff between memory efficiency, privacy protection, and detection accuracy. In this paper, we present a novel framework HG-LDP to achieve accurate Top-k item detection at bounded memory expense, while providing rigorous local differential privacy (LDP) protection. Specifically, we identify two key challenges naturally arising in the task, which reveal that directly applying existing LDP techniques will lead to an inferior "accuracy-privacy-memory efficiency" tradeoff. Therefore, we instantiate three advanced schemes under the framework by designing novel LDP randomization methods, which address the hurdles caused by the large size of the item domain and by the limited space of the memory. We conduct comprehensive experiments on both synthetic and real-world datasets to show that the proposed advanced schemes achieve a superior "accuracy-privacy-memory efficiency" tradeoff, saving 2300× memory over baseline methods when the item domain size is 41,270. Our code is anonymously open-sourced via the link.
- Research Article
- 10.3233/idt-230065
- Feb 20, 2024
- Intelligent Decision Technologies
- Hanqing Hu + 2 more
Explainable Machine Learning brings expandability, interpretability, and accountability to Data Mining Algorithms. Existing explanation frameworks focus on explaining the decision process of a single model in a static dataset. However, in data stream mining changes in data distribution over time, called concept drift, may require updating the learning models to reflect the current data environment. It is therefore important to go beyond static models and understand what has changed among the learning models before and after a concept drift. We propose a Data Stream Explanability framework (DSE) that works together with a typical data stream mining framework where support vector machine models are used. DSE aims to help non-expert users understand model dynamics in a concept drifting data stream. DSE visualizes differences between SVM models before and after concept drift, to produce explanations on why the new model fits the data better. A survey was carried out between expert and non-expert users on the effectiveness of the framework. Although results showed non-expert users on average responded with less understanding of the issue compared to expert users, the difference is not statistically significant. This indicates that DSE successfully brings the explanability of model change to non-expert users.
- Research Article
1
- 10.1016/j.is.2024.102351
- Feb 17, 2024
- Information Systems
- Jie Lu + 5 more
SuperGuardian: Superspreader removal for cardinality estimation in data streaming
- Research Article
1
- 10.1002/sam.11662
- Feb 1, 2024
- Statistical Analysis and Data Mining: The ASA Data Science Journal
- Zahra Nouri + 2 more
Abstract Today's ever‐increasing generation of streaming data demands novel data mining approaches tailored to mining dynamic data streams. Data streams are non‐static in nature, continuously generated, and endless. They often suffer from class imbalance and undergo temporal drift. To address the classification of consecutive data instances within imbalanced data streams, this research introduces a new ensemble classification algorithm called Rarity Updated Ensemble with Oversampling (RUEO). The RUEO approach is specifically designed to exhibit robustness against class imbalance by incorporating an imbalance‐specific criterion to assess the efficacy of the base classifiers and employing an oversampling technique to reduce the imbalance in the training data. The RUEO algorithm was evaluated on a set of 20 data streams and compared against 14 baseline algorithms. On average, the proposed RUEO algorithm achieves an average‐accuracy of 0.69 on the real‐world data streams, while the chunk‐based algorithms AWE, AUE, and KUE achieve average‐accuracies of 0.48, 0.65, and 0.66, respectively. The statistical analysis, conducted using the Wilcoxon test, reveals a statistically significant improvement in average‐accuracy for the proposed RUEO algorithm when compared to 12 out of the 14 baseline algorithms. The source code and experimental results of this research work will be publicly available at https://github.com/vkiani/RUEO.
- Research Article
1
- 10.3390/bdcc8020016
- Jan 31, 2024
- Big Data and Cognitive Computing
- Maryam Badar + 1 more
Fairness-aware mining of data streams is a challenging concern in the contemporary domain of machine learning. Many stream learning algorithms are used to replace humans in critical decision-making processes, e.g., hiring staff, assessing credit risk, etc. This calls for handling massive amounts of incoming information with minimal response delay while ensuring fair and high-quality decisions. Although deep learning has achieved success in various domains, its computational complexity may hinder real-time processing, making traditional algorithms more suitable. In this context, we propose a novel adaptation of Naïve Bayes to mitigate discrimination embedded in the streams while maintaining high predictive performance through multi-objective optimization (MOO). Class imbalance is an inherent problem in discrimination-aware learning paradigms. To deal with class imbalance, we propose a dynamic instance weighting module that gives more importance to new instances and less importance to obsolete instances based on their membership in a minority or majority class. We have conducted experiments on a range of streaming and static datasets and concluded that our proposed methodology outperforms existing state-of-the-art (SoTA) fairness-aware methods in terms of both discrimination score and balanced accuracy.