Abstract

This paper presents an approximative online algorithm to perform the Kolmogorov–Smirnov test. There is a ubiquitous need for evaluating the fitness between statistical distributions and data samples, which this test conveniently meets. Taking some inspiration from the challenges of detecting concept drifts in data streams, our methodology shows how this goodness-of-fit statistical test can be used to detect such events, taking advantage of the fact that it is non-parametric and could be adapted to handle streams while keeping its original relatively small algorithmic complexity. The presented work focused on the one-sample test, which evaluates the hypothesis that a given univariate sample follows some reference distribution, for assessing an input stream with high precision in a time- and space-efficient fashion. The performance of our algorithm and some of the state-of-the-art methods were compared using synthetic and real data. We evaluated the accuracy, effectiveness and efficiency of these methods by making extensive experiments in multiple scenarios: varying reference distribution and its parameters, sample size, available memory, drift point and query interval. The results showed that our algorithm is advantageous in most cases, even with substantial restrictions of computational resources.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call