Abstract

The production and real-time usage of streaming data bring new challenges for data systems due to huge volume of streaming data and quick response request of applications. Message queuing systems that offer high throughput and low latency play an important role in today's big streaming data processing. There are several popular message queuing systems in production usage and also many in-lab message queuing systems in academia. These systems with different design philosopies have different characteristics. It is non-trivial for a non-expert to choose a suitable system to meet his specific requirement. With this premise, our primary contribution is to provide the community with a fair comparison among message queuing systems, using a standardized comparison metric and reproducible experimental environment. Five typical message queuing systems (including Kafka, RabbitMQ, RocketMQ, ActiveMQ and Pulsar) are evaluated qualitatively (in analysis) and quantitatively (in experimental results). This article also highlights the distinct features of each system and summarizes the best-suited use cases of each system. The fair comparison and the insight analysis provided in this article can help users choose the best-suited message queuing systems.

Highlights

  • MESSAGE QUEUING SYSTEM FEATURES we summarize the main features of message queuing systems, which can be used to establish a common framework for comparison between message queuing systems

  • PRODUCTION FEATURES We summarize a few production features and design choices of the message queuing systems as follows: 1) DEVELOPMENT LANGUAGE Different message queuing systems use different development languages, the characteristics of language will bring corresponding advantages to systems

  • ON BEST-SUITED USE CASES Our comprehensive analysis and test results provide a fair comparison of these message queuing systems

Read more

Summary

Introduction

In the era of information explosion, huge amounts of data are being produced, transmitted and consumed continuously every day. Streaming data are generated continuously by thousands of data sources, which typically send data records simultaneously. Streaming data include a wide variety of data such as log files, online purchase records, geospatial data, information from social networks, and financial trading floors. The production and real-time usage of these streaming data bring new challenges for data systems due to its huge volume and quick response time request. Traditional distributed file systems (e.g., HDFS [1]), cloud storage systems (e.g., Amazon S3 [2]), and key-value store systems (e.g., Apache Cassandra [3]) are not competent to support real-time processing of these streaming data [4]. The distributed message queuing systems play an increasingly important role in streaming data processing applications, such as high quality real-time search, analysis, and recommendation services

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call