Abstract

With the rapid development of the Internet, such as storm, s4, sparkstreaming and other large data real-time computing framework, is widely used in real-time monitoring, real-time recommendation, real-time transaction analysis and other systems for real-time consumption of data streams, Kafka messaging system has been widely deployed. Aiming at the problem that the Kafka cluster needs a lot of network overhead, disk overhead and memory consumption to ensure the reliability of the message, the clustering load is increased, and a replica adaptive synchronization strategy based on the message heat and replica update frequency is proposed. It is proved that the Kafka cluster can guarantee the reliability of the message, and it can significantly reduce the overhead of the resource and improve the throughput of the cluster by using the method of dynamically adjusting the replica synchronization to reduce the system resource consumption while ensuring the reliability of the message. to ensure the system availability and high performance.

Highlights

  • With the rapid development of the Internet and mobile Internet, application services based on big data,Such as real-time stock trading analysis, real-time recommendations and other electricity providers to promote the deployment of applications storm, s4, sparkstreaming real-time computing framework.Most of these real-time computing frameworks are based on Kafka messaging system construction

  • The Kafka messaging system may cause serious problems such as prolonged messages, lost messages, out of sync of the primary and secondary data,because of the data update, network delay, server downtime and other reasons,those affect the reliability of the message system.At present, the Kafka message cluster ensures the reliability of data by using the copy mechanism

  • Since the Kafka cluster dynamically maintains the ISR set, this paper combines the reliability of Kafka message queue with the adaptive synchronization strategy in the cloud storage system, In this paper, an adaptive strategy for synchronizing Kafka cluster replica data is proposed to dynamically adjust the ISR set to ensure the reliability of data and to reduce the extra overhead of network overhead, memory and CPU in the cluster

Read more

Summary

Introduction

With the rapid development of the Internet and mobile Internet, application services based on big data,Such as real-time stock trading analysis, real-time recommendations and other electricity providers to promote the deployment of applications storm, s4, sparkstreaming real-time computing framework.Most of these real-time computing frameworks are based on Kafka messaging system construction. The Kafka messaging system may cause serious problems such as prolonged messages, lost messages, out of sync of the primary and secondary data ,because of the data update, network delay, server downtime and other reasons,those affect the reliability of the message system.At present, the Kafka message cluster ensures the reliability of data by using the copy mechanism. For this reason, the Kafka cluster brings more additional network overhead, disk overhead, and memory consumption, which increases the cluster load and affects the overall performance of the message system. A replica adaptive consistency synchronization strategy based on topic heat and replica update frequency is proposed

Related work
Kafka message queue reliability
Kafka reliability
Partition thermal model
Heat prediction model
Copy of the partion update frequency
Copy adaptive consistency Strategy
Analyze the total update amount of news in the cluster
Cluster throughput rate analysis
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call