Abstract
Beyond applying machine learning predictive models to static tasks, a significant corpus of research exists that applies machine learning predictive models to streaming environments that incur concept drift. With the prevalence of streaming real-world applications that are associated with changes in the underlying data distribution, the need for applications that are capable of adapting to evolving and time-varying dynamic environments can be hardly overstated. Dynamic environments are nonstationary and change with time and the target variables to be predicted by the learning algorithm and often evolve with time, a phenomenon known as concept drift. Most work in handling concept drift focuses on updating the prediction model so that it can recover from concept drift while little effort has been dedicated to the formulation of a learning system that is capable of learning different types of drifting concepts at any time with minimum overheads. This work proposes a novel and evolving data stream classifier called Adaptive Diversified Ensemble Selection Classifier (ADES) that significantly optimizes adaptation to different types of concept drifts at any time and improves convergence to new concepts by exploiting different amounts of ensemble diversity. The ADES algorithm generates diverse base classifiers, thereby optimizing the margin distribution to exploit ensemble diversity to formulate an ensemble classifier that generalizes well to unseen instances and provides fast recovery from different types of concept drift. Empirical experiments conducted on both artificial and real-world data streams demonstrate that ADES can adapt to different types of drifts at any given time. The prediction performance of ADES is compared to three other ensemble classifiers designed to handle concept drift using both artificial and real-world data streams. The comparative evaluation performed demonstrated the ability of ADES to handle different types of concept drifts. The experimental results, including statistical test results, indicate comparable performances with other algorithms designed to handle concept drift and prove their significance and effectiveness.
Highlights
Beyond applying machine learning predictive models to static tasks, a significant corpus of research exists that applies machine learning predictive models to streaming environments that incur concept drift
Empirical experiments conducted on both artificial and real-world data streams demonstrate that Adaptive Diversified Ensemble Selection (ADES) can adapt to different types of drifts at any given time. e prediction performance of ADES is compared to three other ensemble classifiers designed to handle concept drift using both artificial and real-world data streams. e comparative evaluation performed demonstrated the ability of ADES to handle different types of concept drifts. e experimental results, including statistical test results, indicate comparable performances with other algorithms designed to handle concept drift and prove their significance and effectiveness
We show that ADES can perform well in concept drift scenarios with minimum overheads as classifiers with the same pattern recognition ability are separated. e experiments conducted on concept drift scenarios show that adaptation to different types of concept drift require different levels of diversity, and timeously adapting to recurring concepts requires the storage of previously learned knowledge
Summary
Received 4 February 2021; Revised 26 April 2021; Accepted 11 May 2021; Published 1 June 2021. E prediction performance of ADES is compared to three other ensemble classifiers designed to handle concept drift using both artificial and real-world data streams. To analyze the behavior of ADES when it encounters different types of drifts using high- and low-diversity ensembles in dynamic environments that exhibit concept drift and perform empirical analysis of ADES, we first use artificial datasets and reaffirm the empirical analysis of ADES using real-world data streams. Detecting recurring concepts requires the manipulation of a series of streaming data chunks and requires O(Nm |Zv| time, where m is the ensemble size and |Zv| represents the number of labeled instances in the validation set used to evaluate the equivalence level between two concepts based on supervised information. For datasets with more than 100 000 instances, ADES executes faster and for datasets that exhibit abrupt concept drifts and with instances less than 500 000, ADES executed faster than Dynse
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.