Using unreliable data for creating more reliable online learners

Leandro L. Minku,Xin Yao

doi:10.1109/ijcnn.2012.6252711

Abstract

Some machine learning applications involve the question of whether or not to use unreliable data for the learning. Previous work shows that learners trained using unreliable data in addition to reliable data present either similar or worse performance than learners trained solely on reliable data. Such learners frequently use unreliable data as if they were reliable and consider only the offline learning scenario. The present paper shows that it is possible to use unreliable data to improve the performance in online learning scenarios with a pre-existing set of unreliable data. We propose an approach called Dynamic Un+Reliable data learners (DUR) able to determine when unreliable data could be useful by maintaining a fixed size weighted memory of unreliable data learners. The weights represent how well learners perform for the current concept and are updated throughout DUR's lifetime. This approach manages not only to outperform an approach which uses only reliable data, but also an approach which uses unreliable data as if they were reliable. Moreover, the variance in performance is reduced in comparison to the approach which uses only reliable data. In other words, DUR is a more reliable learner.

Full Text