Abstract

Real-time monitoring microblog data can find sensitive information in time and provide help for public sentiment management and control. However, it needs processing large-scale data stream. MapReduce is a framework of processing large-scale data in batch mode, its purpose is to increase throughput, but its real-time performance is limited. Aiming at the real-time performance limitation of MapReduce, RT-SSP Real-Time Staged Stream Processing, a hybrid staged real-time stream processing scheme both for batch and real-time processing was proposed. By this method large-scale high-speed data stream is locally processed in stages, the communication cost is reduced by storing intermediate results to local node, and key technologies such as cache optimization are used to realize high concurrent read and write. Experiments show that RT-SSP scheme can improve the real-time performance of processing large-scale microblog data stream and achieve speed-up ratio of about 2.3.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.