Abstract

The spread of real-time applications has led to a huge amount of data shared between users. This vast volume of data rapidly evolving over time is referred to as data stream. Clustering and processing such data poses many challenges to the data mining community. Indeed, traditional data mining techniques become unfeasible to mine such a continuous flow of data where characteristics, features, and concepts are rapidly changing over time. This paper presents a novel method for data stream clustering. In this context, major challenges of data stream processing are addressed, namely, infinite length, concept drift, novelty detection, and feature evolution. To handle these issues, the proposed method uses the Artificial Immune System (AIS) meta-heuristic. The latter has been widely used for data mining tasks and it owns the property of adaptability required by data stream clustering algorithms. Our method, called AIS-Clus, is able to detect novel concepts using the performance of the learning process of the AIS meta-heuristic. Furthermore, AIS-Clus has the ability to adapt its model to handle concept drift and feature evolution for textual data streams. Experimental results have been performed on textual datasets where efficient and promising results are obtained.

Highlights

  • In recent years, the emergence of new technologies and real-time applications has brought an enormous volume of data shared between worldwide users anytime and224 A

  • Datasets In order to test the e±ciency of our algorithm, we adopt Twitter datasets as they are best modeled as data stream

  • Handling concept drift and feature evolution To cope with concept drift in text data streams, we focus on a weighting term strategy

Read more

Summary

Introduction

The emergence of new technologies and real-time applications has brought an enormous volume of data shared between worldwide users anytime and224 A. The emergence of new technologies and real-time applications has brought an enormous volume of data shared between worldwide users anytime and. The fast and continuous °ow of information generated from real-time applications is known as data stream. Data stream can be dened as an ordered sequence of data items °owing continuously at a high-speed rate. Many sources, such as social networks, mobile and Web applications, and telecommunication services, may generate these streams of data. This real-time data is omnipresent in our daily life and may contain a lot of knowledge and hidden information. Many companies are exploiting those °ows of data to increase their sales by matching their products to the expectations and interests of potential buyers

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.