Extreme Verification Latency Research Articles

Existing Data Stream Mining algorithms assume the availability of labelled and balanced data streams. However, in many real-world applications such as Robotics, Weather Monitoring, Fraud-Detection systems, Cyber Security, and Human Activity Recognition, a vast amount of high-speed data is generated by Internet of Things sensors and real-time data on the Internet are unlabelled. Furthermore, the prediction models need to learn in Non-Stationary Environments due to evolving concepts. Manual labelling of these data streams is not practical due to the need for domain expertise and the time-resource-prohibitive nature of the required effort. To deal with such scenarios, existing approaches are self-Learning or Cluster-Guided Classification (CGC) which predict the pseudo-labels, which further update the prediction models. Previous studies have yet to establish a clear and conclusive view as to when, and why one pseudo-labelling approach should be preferable to another and what causes an approach to fail. In this research, we propose a novel approach, “ <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Predictor for Streaming Data with Scarce Labels</i> ” (PSDSL), which is capable of intelligently switching between self-learning, CGC and micro-clustering strategies, based on the problem it is applied to, i.e., the different characteristics of the data streams. In PSDSL a novel approach called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Envelope-Clustering</i> has been introduced to resolve the conflict during the cluster labelling which suggested a confidence measure approach to ensure the quality and correctness of labels assigned to the clusters. The auto parameter tuning mechanism of PSDSL eliminates the human dependency and determines the best value of number of centroids from initial labelled data. The predictive performance of the PSDSL is evaluated on non-stationary datasets, synthetic data-streams, and real-world datasets. The approach has shown promising results on randomised datasets as well as on synthetic data-streams, as compared with state-of-the-art approaches. This is the first large-scale study on an adaptive extreme verification approach that supports automatic parameter tuning and intelligent switching of pseudo-labelling strategy, thus reducing the dependency of machine learning on human input.

An increasing number of real-world applications are associated with streaming data drawn from drifting and nonstationary distributions that change over time. These applications demand new algorithms that can learn and adapt to such changes, also known as concept drift. Proper characterization of such data with existing approaches typically requires substantial amount of labeled instances, which may be difficult, expensive, or even impractical to obtain. In this paper, we introduce compacted object sample extraction (COMPOSE), a computational geometry-based framework to learn from nonstationary streaming data, where labels are unavailable (or presented very sporadically) after initialization. We introduce the algorithm in detail, and discuss its results and performances on several synthetic and real-world data sets, which demonstrate the ability of the algorithm to learn under several different scenarios of initially labeled streaming environments. On carefully designed synthetic data sets, we compare the performance of COMPOSE against the optimal Bayes classifier, as well as the arbitrary subpopulation tracker algorithm, which addresses a similar environment referred to as extreme verification latency. Furthermore, using the real-world National Oceanic and Atmospheric Administration weather data set, we demonstrate that COMPOSE is competitive even with a well-established and fully supervised nonstationary learning algorithm that receives labeled data in every batch.

Extreme Verification Latency Research Articles

Articles published on Extreme Verification Latency

Adaptation for Automated Drift Detection in Electromechanical Machine Monitoring.

Novelty detection for multi-label stream classification under extreme verification latency

Adaptive Learning With Extreme Verification Latency in Non-Stationary Environments

AMANDA: Semi-supervised density-based adaptive model for non-stationary data with extreme verification latency

A Novelty Detector and Extreme Verification Latency Model for Nonstationary Environments

COMPOSE: A semisupervised learning framework for initially labeled nonstationary streaming data.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Extreme Verification Latency Research Articles

Articles published on Extreme Verification Latency

Adaptation for Automated Drift Detection in Electromechanical Machine Monitoring.

Novelty detection for multi-label stream classification under extreme verification latency

Adaptive Learning With Extreme Verification Latency in Non-Stationary Environments

AMANDA: Semi-supervised density-based adaptive model for non-stationary data with extreme verification latency

A Novelty Detector and Extreme Verification Latency Model for Nonstationary Environments

COMPOSE: A semisupervised learning framework for initially labeled nonstationary streaming data.