Concept Drift Detection Research Articles

Purpose The primary purpose of this paper is to introduce the drift detection method-online random forest (DDM-ORF) model for intrusion detection, combining DDM for detecting concept drift and ORF for incremental learning. The paper addresses the challenges of dynamic and nonstationary data, offering a solution that continuously adapts to changes in the data distribution. The goal is to provide effective intrusion detection in real-world scenarios, demonstrated through comprehensive experiments and evaluations using Apache Spark. Design/methodology/approach The paper uses an experimental approach to evaluate the DDM-ORF model. The design involves assessing classification performance metrics, including accuracy, precision, recall and F-measure. The methodology integrates Apache Spark for distributed computing, using metrics such as processed records per second and input rows per second. The evaluation extends to the analysis of IP addresses, ports and taxonomies in the MAWILab data set. This comprehensive design and methodology showcase the model’s effectiveness in detecting intrusions through concept drift detection and online incremental learning on large-scale, heterogeneous data. Findings The paper’s findings reveal that the DDM-ORF model achieves outstanding classification results with 99.96% accuracy, demonstrating its efficacy in intrusion detection. Comparative analysis against a convolutional neural network-based model indicates superior performance in anomalous and suspicious detection rates. The exploration of IP addresses, ports and taxonomies uncovers valuable insights into attack patterns. Apache Spark evaluation attests to the system’s high processing rates. The study emphasizes the scalability, availability and fault tolerance of DDM-ORF, making it suitable for real-world scenarios. Overall, the paper establishes the model’s proficiency in handling dynamic, nonstationary data for intrusion detection. Research limitations/implications The research acknowledges certain limitations, including the potential challenge of DDM detecting only frequency changes in class labels and not complex concept drifts. The incremental random forest’s reliance on memory may pose constraints as the forest size increases, potentially leading to overfitting. Addressing these limitations could involve exploring alternative concept drift detection algorithms and implementing ensemble pruning techniques for memory efficiency. Further research avenues may investigate algorithms balancing accuracy and memory usage, such as compressed random forests, to enhance the model’s effectiveness in evolving data environments. Practical implications The study’s practical implications are noteworthy. The proposed DDM-ORF model, designed for intrusion detection through concept drift detection and online incremental learning, offers a scalable, available and fault-tolerant solution. Leveraging Apache Spark and Microsoft Azure Cloud enhances processing capabilities for large data sets in dynamic, nonstationary scenarios. The model’s applicability to heterogeneous data sets and its achievement of high-accuracy multi-class classification make it suitable for real-world intrusion detection. Moreover, the auto-scaling features of Microsoft Azure Cloud contribute to adaptability, ensuring efficient resource utilization without downtime. These practical implications underscore the model’s relevance and effectiveness in diverse operational contexts. Social implications The DDM-ORF model’s social implications are significant, contributing to enhanced cybersecurity measures. By providing an effective intrusion detection system, it helps safeguard digital ecosystems, preserving user privacy and securing sensitive information. The model’s accuracy in identifying and classifying various intrusion attempts aids in mitigating potential cyber threats, thereby fostering a safer online environment for individuals and organizations. As cybersecurity is paramount in the digital age, the social impact lies in fortifying the resilience of networks, systems and data against malicious activities, ultimately promoting trust and reliability in online interactions. Originality/value The DDM-ORF model introduces a novel approach to intrusion detection by combining drift detection and online incremental learning. This originality lies in its utilization of the DDM-ORF algorithm, offering a dynamic and adaptive system for evolving data. The model’s contribution extends to its scalability, fault-tolerance and suitability for heterogeneous data sets, addressing challenges in dynamic, nonstationary environments. Its application on a large-scale data set and multi-class classification, along with integration with Apache Spark and Microsoft Azure Cloud, enhances the field’s understanding and application of intrusion detection, providing valuable insights for securing digital infrastructures.

AIOps (Artificial Intelligence for IT Operations) solutions leverage the massive data produced during the operation of large-scale systems and machine learning models to assist software engineers in their system operations. As operation data produced in the field are constantly evolving due to factors such as the changing operational environment and user base, the models in AIOps solutions need to be constantly maintained after deployment. While prior works focus on innovative modeling techniques to improve the performance of AIOps models before releasing them into the field, when and how to update AIOps models remain an under-investigated topic. In this work, we performed a case study on three large-scale public operation data: two trace datasets from the cloud computing platforms of Google and Alibaba and one disk stats dataset from the BackBlaze cloud storage data center. We empirically assessed five different types of model update strategies for supervised learning regarding their performance, updating cost, and stability. We observed that active model update strategies (e.g., periodical retraining, concept drift guided retraining, time-based model ensembles, and online learning) achieve better and more stable performance than a stationary model. Particularly, applying sophisticated model update strategies (e.g., concept drift detection, time-based ensembles, and online learning) could provide better performance, efficiency, and stability than simply retraining AIOps models periodically. In addition, we observed that, although some update strategies (e.g., time-based ensemble and online learning) can save model training time, they significantly sacrifice model testing time, which could hinder their applications in AIOps solutions where the operation data arrive at high pace and volume and where immediate inferences are required. Our findings highlight that practitioners should consider the evolution of operation data and actively maintain AIOps models over time. Our observations can also guide researchers and practitioners in investigating more efficient and effective model update strategies that fit in the context of AIOps.

Concept Drift Detection Research Articles

Related Topics

Articles published on Concept Drift Detection

Intrusion detection based on concept drift detection and online incremental learning

Data Poisoning Attack against Neural Network-Based On-Device Learning Anomaly Detector by Physical Attacks on Sensors.

Hoeffding adaptive trees for multi-label classification on data streams

Detecting and rationalizing concept drift: A feature-level approach for understanding cause–effect relationships in dynamic environments

Estimating data complexity and drift through a multiscale generalized impurity approach

A benchmark and survey of fully unsupervised concept drift detectors on real-world data streams

Variance Feedback Drift Detection Method for Evolving Data Streams Mining

An artificial intelligence framework for explainable drift detection in energy forecasting

A Unified Framework for Detecting Gradual and Abrupt Concept Drifts in Data Stream Mining: The Concept Drift Detection Framework with Hybrid Meta-Learning (CDDF-HML)

Efficient and scalable covariate drift detection in machine learning systems with serverless computing

One or two things we know about concept drift-a survey on monitoring in evolving environments. Part A: detecting concept drift.

Temporal Attention for Few-Shot Concept Drift Detection in Streaming Data

A novel framework for concept drift detection using autoencoders for classification problems in data streams

Application of concept drift detection and adaptive framework for non linear time series data from cardiac surgery

Online Concept Drift Detector: Optimally Balancing Delay Detection, Runtime, Memory, and Accuracy.

On the Model Update Strategies for Supervised Learning in AIOps Solutions

Concept drift detection methods based on different weighting strategies

Ensemble framework for concept drift detection and class imbalance in data streams

A study on concept drift detection algorithms for real-world data streams

Concept Drift Mitigation in Low-Cost Air Quality Monitoring Networks.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Concept Drift Detection Research Articles

Related Topics

Articles published on Concept Drift Detection

Intrusion detection based on concept drift detection and online incremental learning

Data Poisoning Attack against Neural Network-Based On-Device Learning Anomaly Detector by Physical Attacks on Sensors.

Hoeffding adaptive trees for multi-label classification on data streams

Detecting and rationalizing concept drift: A feature-level approach for understanding cause–effect relationships in dynamic environments

Estimating data complexity and drift through a multiscale generalized impurity approach

A benchmark and survey of fully unsupervised concept drift detectors on real-world data streams

Variance Feedback Drift Detection Method for Evolving Data Streams Mining

An artificial intelligence framework for explainable drift detection in energy forecasting

A Unified Framework for Detecting Gradual and Abrupt Concept Drifts in Data Stream Mining: The Concept Drift Detection Framework with Hybrid Meta-Learning (CDDF-HML)

Efficient and scalable covariate drift detection in machine learning systems with serverless computing

One or two things we know about concept drift-a survey on monitoring in evolving environments. Part A: detecting concept drift.

Temporal Attention for Few-Shot Concept Drift Detection in Streaming Data

A novel framework for concept drift detection using autoencoders for classification problems in data streams

Application of concept drift detection and adaptive framework for non linear time series data from cardiac surgery

Online Concept Drift Detector: Optimally Balancing Delay Detection, Runtime, Memory, and Accuracy.

On the Model Update Strategies for Supervised Learning in AIOps Solutions

Concept drift detection methods based on different weighting strategies

Ensemble framework for concept drift detection and class imbalance in data streams

A study on concept drift detection algorithms for real-world data streams

Concept Drift Mitigation in Low-Cost Air Quality Monitoring Networks.