Abstract

In recent years, the prevalence of technological advances has led to an enormous and ever-increasing amount of data that are now commonly available in a streaming fashion. In such nonstationary environments, the underlying process generating the data stream is characterized by an intrinsic nonstationary or evolving or drifting phenomenon known as concept drift. Given the increasingly common applications whose data generation mechanisms are susceptible to change, the need for effective and efficient algorithms for learning from and adapting to evolving or drifting environments can hardly be overstated. In dynamic environments associated with concept drift, learning models are frequently updated to adapt to changes in the underlying probability distribution of the data. A lot of work in the area of learning in nonstationary environments focuses on updating the learning predictive model to optimize recovery from concept drift and convergence to new concepts by adjusting parameters and discarding poorly performing models while little effort has been dedicated to investigate what type of learning model is suitable at any given time for different types of concept drift. In this paper, we investigate the impact of heterogeneous online ensemble learning based on online model selection for predictive modeling in dynamic environments. We propose a novel heterogeneous ensemble approach based on online dynamic ensemble selection that accurately interchanges between different types of base models in an ensemble to enhance its predictive performance in nonstationary environments. The approach is known as Heterogeneous Dynamic Ensemble Selection based on Accuracy and Diversity (HDES-AD) and makes use of models generated by different base learners to increase diversity to circumvent problems associated with existing dynamic ensemble classifiers that may experience loss of diversity due to the exclusion of base learners generated by different base algorithms. The algorithm is evaluated on artificial and real-world datasets with well-known online homogeneous online ensemble approaches such as DDD, AFWE, and OAUE. The results show that HDES-AD performed significantly better than the other three homogeneous online ensemble approaches in nonstationary environments.

Highlights

  • Ensembles of classifiers have been successfully used in a variety of applications including text classification and extraction such as keyword extraction in text classification [1], text classification based on supervised clustering [2], text genre classification based on language function analysis, and feature engineering [3]

  • Even though it is well known that various types of predictive models can provide a very different predictive performance depending on the problem being tackled, little work has been dedicated to the investigation of what type of predictive model is most adequate over time in nonstationary environments where each example is learned separately upon arrival and discarded [5]

  • Erefore, this paper proposes an adaptive online heterogeneous ensemble learning algorithm for nonstationary environments based on dynamic ensemble selection, known as Heterogeneous Dynamic Ensemble Selection based on Accuracy and Diversity (HDES-AD)

Read more

Summary

Introduction

Ensembles of classifiers have been successfully used in a variety of applications including text classification and extraction such as keyword extraction in text classification [1], text classification based on supervised clustering [2], text genre classification based on language function analysis, and feature engineering [3]. E nonstationarity can be a result of, for example, seasonality or periodicity effects, changes in the user’s habits or preferences, and hardware or software faults affecting a Computational Intelligence and Neuroscience cyber-physical system In such nonstationary environments, where the probabilistic properties of the data change over time, a nonadaptive model trained under the false stationarity assumption is bound to become obsolete in time and perform suboptimally at best or fail catastrophically at worst [4]. Passive ensemble approaches do not use concept drift detection methods but maintain an ensemble of predictive models. Is enables the algorithm to store base models of different forms of diversity and accuracy and use them to optimize prediction performance to accurately adapt timeously to concept drift.

Related Work
Drift Detection and Adaptation
Analysis of Heterogeneity and Significance Difference
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call