Abstract

Most studies on stream mining frameworks handles model retraining and preprocessing together. We propose Smart Preprocessing for Streaming Data (SPSD) approach which separates normalization of each numeric features from model retraining. The goal of SPSD is to reduce the number of new models needed in a stream mining framework while maintaining comparable quality. The approach re-normalizes data based on two metrics calculated with min-max value in each chunk of data. In experiments with real world data we showed that SPSD is able to maintain quality of classification in approximately 50% of all data chunks by only re-normalize the data without building new classification models. In our comparison with traditional stream mining frameworks we showed that traditional frameworks can benefit from SPSD in approximately 30% to 50% of total data chunks. Benefits include eliminating training cost of new models in these chunks and reducing overall total number of models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call