Abstract

Batch and stream processing are separately and efficiently applied in many applications. However, some newer data-driven applications such as the Internet of Things and cloud computing call for hybrid processing approaches in order to handle the speed and accuracy required for processing such complex data. In this paper, we propose a Hybrid Distributed Batch-Stream (HDBS) architecture for anomaly detection in real-time data. The hybrid architecture, while benefiting from the accuracy provided by batch processing, also enjoys the speed and real-time features of stream processing. In the proposed architecture, our focus is on the algorithmic aspects of hybrid processing including the interaction models between batch and stream processing units, the characteristics of batch and stream machine learning algorithms and the principles of merging the results of different processing units. The driving idea of such combination is that the results of batch and stream processing units are complementary with each other, as one of them constructs accurate models based on previous data, and the other one is capable of processing new stream data in real-time. Furthermore, we propose a generalized version of the HDBS with respect to its algorithms and communication policy levels. In the generalized HDBS architecture, we address the various aspects of the interaction between the batch and stream processing units, and the merging operations to produce the final results. the evaluations of the proposed architecture using various criteria (accuracy, space complexity, and time complexity) demonstrate that the accuracy of the proposed method is higher than the accuracy of the batch processing methods, its time complexity is also similar to one of the stream processing methods and much less than the batch processing methods, which makes our proposed architecture an efficient and practical solution for real-time anomaly detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call