Presence Of Concept Drift Research Articles

BackgroundConcept drift and covariate shift lead to a degradation of machine learning (ML) models. The objective of our study was to characterize sudden data drift as caused by the COVID pandemic. Furthermore, we investigated the suitability of certain methods in model training to prevent model degradation caused by data drift.MethodsWe trained different ML models with the H2O AutoML method on a dataset comprising 102,666 cases of surgical patients collected in the years 2014–2019 to predict postoperative mortality using preoperatively available data. Models applied were Generalized Linear Model with regularization, Default Random Forest, Gradient Boosting Machine, eXtreme Gradient Boosting, Deep Learning and Stacked Ensembles comprising all base models. Further, we modified the original models by applying three different methods when training on the original pre-pandemic dataset: (1) we weighted older data weaker, (2) used only the most recent data for model training and (3) performed a z-transformation of the numerical input parameters. Afterwards, we tested model performance on a pre-pandemic and an in-pandemic data set not used in the training process, and analysed common features.ResultsThe models produced showed excellent areas under receiver-operating characteristic and acceptable precision-recall curves when tested on a dataset from January-March 2020, but significant degradation when tested on a dataset collected in the first wave of the COVID pandemic from April-May 2020. When comparing the probability distributions of the input parameters, significant differences between pre-pandemic and in-pandemic data were found. The endpoint of our models, in-hospital mortality after surgery, did not differ significantly between pre- and in-pandemic data and was about 1% in each case. However, the models varied considerably in the composition of their input parameters. None of our applied modifications prevented a loss of performance, although very different models emerged from it, using a large variety of parameters.ConclusionsOur results show that none of our tested easy-to-implement measures in model training can prevent deterioration in the case of sudden external events. Therefore, we conclude that, in the presence of concept drift and covariate shift, close monitoring and critical review of model predictions are necessary.

Read full abstract

When put into practice in the real world, predictive maintenance presents a set of challenges for fault detection and prognosis that are often overlooked in studies validated with data from controlled experiments, or numeric simulations. For this reason, this study aims to review the recent advancements in mechanical fault diagnosis and fault prognosis in the manufacturing industry using machine learning methods. For this systematic review, we searched Web of Science, ACM Digital Library, Science Direct, Wiley Online Library, and IEEE Xplore between January 2015 and October 2021. Full-length studies that employed machine learning algorithms to perform mechanical fault detection or fault prognosis in manufacturing equipment and presented empirical results obtained from industrial case-studies were included, except for studies not written in English or published in sources other than peer-reviewed journals with JCR Impact Factor, conference proceedings and book chapters/sections. Of 4549 records, 44 primary studies were selected. In 37 of those studies, fault diagnosis and prognosis were performed using artificial neural networks (n = 12), decision tree methods (n = 11), hybrid models (n = 8), or latent variable models (n = 6), with one of the studies employing two different types of techniques independently. The remaining studies employed a variety of machine learning techniques, ranging from rule-based models to partition-based algorithms, and only two studies approached the problem using online learning methods. The main advantages of these algorithms include high performance, the ability to uncover complex nonlinear relationships and computational efficiency, while the most important limitation is the reduction in model performance in the presence of concept drift. This review shows that, although the number of studies performed in the manufacturing industry has been increasing in recent years, additional research is necessary to address the challenges presented by real-world scenarios.

Read full abstract

Presence Of Concept Drift Research Articles

Related Topics

Articles published on Presence Of Concept Drift

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic

Online Machine Learning from Non-stationary Data Streams in the Presence of Concept Drift and Class Imbalance: A Systematic Review

An efficient and straightforward online vector quantization method for a data stream through remove-birth updating.

An improved arithmetic optimization algorithm for training feedforward neural networks under dynamic environments

A Survey on Classifying Big Data with Label Noise

An effectiveness analysis of transfer learning for the concept drift problem in malware detection

An adaptive deep learning framework for day-ahead forecasting of photovoltaic power generation

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review.

Learning Calibration Functions on the Fly: Hybrid Batch Online Stacking Ensembles for the Calibration of Low-Cost Air Quality Sensor Networks in the Presence of Concept Drift

An extensive study of C-SMOTE, a Continuous Synthetic Minority Oversampling Technique for Evolving Data Streams

An Adaptative Active Approach with Monitoring Quality of Stream Modeling in the Presence of Concept Drift

Real-Time Concept Drift Detection and Its Application to ECG Data

ADAW: Age decay accuracy weighted ensemble method for drifting data stream mining

Dynamically Adjusting Diversity in Ensembles for the Classification of Data Streams with Concept Drift

Spam Detection Based on Feature Evolution to Deal with Concept Drift

Supervised learning in the presence of concept drift: a modelling framework

A clustering and ensemble based classifier for data stream classification

A Classification Approach Based on Divergence for Network Traffic in Presence of Concept Drift

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

Mining frequent itemsets from streaming transaction data using genetic algorithms

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Presence Of Concept Drift Research Articles

Related Topics

Articles published on Presence Of Concept Drift

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic

Online Machine Learning from Non-stationary Data Streams in the Presence of Concept Drift and Class Imbalance: A Systematic Review

An efficient and straightforward online vector quantization method for a data stream through remove-birth updating.

An improved arithmetic optimization algorithm for training feedforward neural networks under dynamic environments

A Survey on Classifying Big Data with Label Noise

An effectiveness analysis of transfer learning for the concept drift problem in malware detection

An adaptive deep learning framework for day-ahead forecasting of photovoltaic power generation

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review.

Learning Calibration Functions on the Fly: Hybrid Batch Online Stacking Ensembles for the Calibration of Low-Cost Air Quality Sensor Networks in the Presence of Concept Drift

An extensive study of C-SMOTE, a Continuous Synthetic Minority Oversampling Technique for Evolving Data Streams

An Adaptative Active Approach with Monitoring Quality of Stream Modeling in the Presence of Concept Drift

Real-Time Concept Drift Detection and Its Application to ECG Data

ADAW: Age decay accuracy weighted ensemble method for drifting data stream mining

Dynamically Adjusting Diversity in Ensembles for the Classification of Data Streams with Concept Drift

Spam Detection Based on Feature Evolution to Deal with Concept Drift

Supervised learning in the presence of concept drift: a modelling framework

A clustering and ensemble based classifier for data stream classification

A Classification Approach Based on Divergence for Network Traffic in Presence of Concept Drift

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

Mining frequent itemsets from streaming transaction data using genetic algorithms