Abstract

Nonstationary streaming data are characterized by changes in the underlying distribution between subsequent time steps. Learning in such environments becomes even more challenging when labeled data are available only at the initial time step, and the algorithm is provided unlabeled data thereafter, a scenario referred to as extreme verification latency. Our previously introduced COMPOSE framework works very well in such settings. COMPOSE is a semi-supervised approach that iteratively labels strategically chosen instances of the next time step using the instances it labeled in the previous time step. COMPOSE originally assumed a significant distribution overlap at consecutive time steps, allowing instances lying in the center of the feature space to be used as the most representative labeled instances from current time step to help label the new data at the next time step. Such an assumption is also inherent in importance weighting based domain adaptation, but only for a single time step with mismatched train and test data distributions. We explore importance weighting not for a single time step matching training / test distributions, but rather matching distributions between two consecutive time steps, and estimate the posterior distribution of the unlabeled data using importance weighted least squares probabilistic classifier. The estimated labels are then iteratively used as the training data for the next time step. We call this algorithm as LEVELIW, Learning Extreme VErification Latency with Importance Weighting. Our primary goal in doing so is to determine if and when importance weighting provides an advantage over COMPOSE's core support extraction, and whether it provides an alternate solution with reduced parameter sensitivity. Several datasets are used to compare the two approaches, which produced some unique insights.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call