Abstract

In this study, we propose a distributed architecture that dynamically updates the model for classifying tweet streams generated in real time. Our architecture ingests data streams through Apache Kafka and classifies them based on Apache Spark Streaming. In order to dynamically reflect input stream changes into the classification model, we design the classification model that can be dynamically updated by updating the tokenizer and classifier for new tweet streams. The proposed architecture can provide effective classification for data streams due to the dynamic update and can efficiently process through parallel processing of distributed environments. Through experiments using cyberattack-related tweets, we show that our classification model gradually improves the classification accuracy from 0.8869 when the initial 50,000 tweets are used to 0.9094 when 200,000 tweets are accumulated by F1-score.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call