Abstract
In data stream mining, predictive models typically suffer drops in predictive performance due to concept drift. As enough data representing the new concept must be collected for the new concept to be well learnt, the predictive performance of existing models usually takes some time to recover from concept drift. To speed up recovery from concept drift and improve predictive performance in data stream mining, this work proposes a novel approach called Multi-sourcE onLine TrAnsfer learning for Non-statIonary Environments (Melanie). Melanie is the first approach able to transfer knowledge between multiple data streaming sources in non-stationary environments. It creates several sub-classifiers to learn different aspects from different source and target concepts over time. The sub-classifiers that match the current target concept well are identified, and used to compose an ensemble for predicting examples from the target concept. We evaluate Melanie on several synthetic data streams containing different types of concept drift and on real world data streams. The results indicate that Melanie can deal with a variety drifts and improve predictive performance over existing data stream learning algorithms by making use of multiple sources.
Highlights
Many real world applications produce data in a streaming fashion, i.e., as a sequence of observations that arrive over time
We propose a novel approach called Multi-sourcE onLine TrAnsfer learning for Non-statIonary Environments (Melanie)
This paper aims to answer the following research question: can multi-source transfer learning improve the predictive performance in data stream mining? When and why? For that, we proposed Melanie
Summary
Many real world applications produce data in a streaming fashion, i.e., as a sequence of observations that arrive over time. One of the reasons why concept drift exacerbates this challenge is that, when a previously unseen joint probability distribution is encountered, existing approaches depend on the arrival of new data to learn an appropriate model of this new distribution. The accuracy of such approaches tends to be poor during the period of time where insufficient data has been received for training. A possible solution to this issue is to use information learned from different sources to speed up the learning of a new target concept, and thereafter improve the accuracy of the estimation This is called transfer learning [2]. Transfer learning has the potential to speed up adaptation to concept drift, improving predictive performance in data stream mining
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.