Abstract

Incremental feature extraction algorithms are designed to analyze large-scale data streams. Many of them suffer from high computational cost, time complexity, and data dependency, which adversely affects the processing of the data stream. With this motivation, this paper presents a novel incremental feature extraction approach based on the Discrete Cosine Transform (DCT) for the data stream. The proposed approach is separated into initial and sequential phases, and each phase uses a fixed-size windowing technique for processing the current samples. The initial phase is performed only on the first window to construct the initial model as a baseline. In this phase, normalization and DCT are applied to each sample in the window. Subsequently, the efficient feature subset is determined by a particle swarm optimization-based method. With the construction of the initial model, the sequential phase begins. The normalization and DCT processes are likewise applied to each sample. Afterward, the feature subset is selected according to the initial model. Finally, the k-nearest neighbor classifier is employed for classification. The approach is tested on the well-known streaming data sets and compared with state-of-the-art incremental feature extraction algorithms. The experimental studies demonstrate the proposed approach’s success in terms of recognition accuracy and learning time.

Highlights

  • The rapid growth of technology increases application areas day by day

  • The linear algorithms are the traditional Principal Component Analysis (PCA) [13], incremental versions of PCA (IPCA) proposed by Li [18] (IPCA-Li), IPCA proposed by Ozawa [19]

  • The most popular incremental feature extraction algorithms are the incremental versions of PCA

Read more

Summary

Introduction

The rapid growth of technology increases application areas day by day. In recent years, the developed application areas such as social networks [1], electronic business [2], cloud computing [3,4], computer networks measurement [5,6,7], internet of things applications [8], etc. are generating large volume data [9].Such large volume data are known as data streams, and they have different characteristics. The rapid growth of technology increases application areas day by day. The developed application areas such as social networks [1], electronic business [2], cloud computing [3,4], computer networks measurement [5,6,7], internet of things applications [8], etc. The probability distribution of the data stream may change over time dynamically. It is processed in real-time without intermission. Collecting true class labels of all instances in-stream is infeasible for real-time scenarios. The characteristics of the data streams have brought a huge challenge to processing them [10]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.