Abstract
This paper presents an approximative online algorithm to perform the Kolmogorov–Smirnov test. There is a ubiquitous need for evaluating the fitness between statistical distributions and data samples, which this test conveniently meets. Taking some inspiration from the challenges of detecting concept drifts in data streams, our methodology shows how this goodness-of-fit statistical test can be used to detect such events, taking advantage of the fact that it is non-parametric and could be adapted to handle streams while keeping its original relatively small algorithmic complexity. The presented work focused on the one-sample test, which evaluates the hypothesis that a given univariate sample follows some reference distribution, for assessing an input stream with high precision in a time- and space-efficient fashion. The performance of our algorithm and some of the state-of-the-art methods were compared using synthetic and real data. We evaluated the accuracy, effectiveness and efficiency of these methods by making extensive experiments in multiple scenarios: varying reference distribution and its parameters, sample size, available memory, drift point and query interval. The results showed that our algorithm is advantageous in most cases, even with substantial restrictions of computational resources.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.