Abstract

As many real data sets (e.g., social, financial, and medical data sets) are successively generated in evolution with the ever-changing environment, classification for data stream with concept drift attracts increasing attention in the fields of machine learning and data mining. However, to the best of our knowledge, existing works mainly consider the concept drift issue while ignoring another common characteristic of real data, i.e., existence of awkward heterogeneity caused by mixture of numerical and categorical attributes. It is worth noting that tackling both the concept drift and heterogeneity problems together is exponentially more challenging than dealing with only one of them. This paper, therefore, proposes an ensemble learning approach for the classification of numerical-and-categorical-attribute data (also called mixed data hereinafter) under concept drift. We first design a unified metric to appropriately address the heterogeneity of numerical and categorical attributes. Then a base classifier that can appropriately fuse the information provided by the heterogeneous attributes is formed accordingly. Furthermore, to make the classification adapt to the complex concept drifts demonstrated on the heterogeneous attributes, two types of base classifier ensembles are dynamically learned on the fly. Experimental results on various real mixed data sets with concept drifts demonstrate the efficacy of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call