Abstract
With the tremendous increase of the online data, training a single classifier may suffer because of the large variety of data domains. One solution could be to learn separate classifiers for each domain. However, this would arise a huge cost to gather annotated training data for a large number of domains and ignore similarity shared across domains. Hence, it leads to our problem setting: can labeled data from a related source domain help predict the unlabeled data in the target domainƒ In this paper, we consider two independent simultaneous data streams, which are referred to as the source and target streams. The target stream continuously generates data instances from one domain where the label is unknown, while the source stream continuously generates labeled data instances from another domain. Most likely, the two data streams would have different but related feature spaces and different data distributions. Moreover, these streams may have asynchronous concept drifts between them. Our problem setting, which is called Cross-domain Multistream Classification, is to predict the class labels of data instances in the target stream using a classifier trained on the labeled source stream. In this paper, we propose an efficient solution for cross-domain multistream classification by integrating change detection into online data stream adaptation. The class labels of data instances in the target stream are predicted using the sufficient amount of label information in the related source stream. And the concept drifts along the two independent streams are continuously being addressed at the same time. Experimental results on real-world data sets indicate significantly improved performance over baseline methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have