Abstract

Most of the traditional top-k algorithms are based on a single-server setting. They may be highly inefficient and/or cause huge communication overhead when applied to a distributed system environment. Therefore, the problem of top-k monitoring in distributed environments has been intensively investigated recently. This paper studies how to monitor the top-k data objects with the largest aggregate numeric values from distributed data streams within a fixed-size monitoring window W, while minimizing communication cost across the network. We propose a novel algorithm, which adaptively reallocates numeric values of data objects among distributed nodes by assigning revision factors when local constraints are violated and keeps the local top-k result at distributed nodes in line with the global top-k result. We also develop a framework that combines a distributed data stream monitoring architecture with a sliding window model. Based on this framework, extensive experiments are conducted on top of Apache Storm to verify the efficiency and scalability of the proposed algorithm.

Highlights

  • The study of distributed top-k monitoring is significant in a variety of application scenarios, such as network monitoring, sensor data analysis, web usage logs, and market surveillance

  • We propose a novel algorithm, which adaptively reallocates numeric values of data objects among distributed nodes by assigning revision factors when local constraints are violated and keeps the local top-k result at distributed nodes in line with the global top-k result

  • We propose a novel algorithm for top-k monitoring over distributed data streams, which achieves a significant reduction in communication cost

Read more

Summary

Introduction

The study of distributed top-k monitoring is significant in a variety of application scenarios, such as network monitoring, sensor data analysis, web usage logs, and market surveillance. Consider a system that monitors a large network for distributed denial of service (DDoS) attacks. The DDoS attacks may issue an unusual large number of Domain Name Service (DNS) lookup requests to distributed DNS servers from a single IP address. It is necessary to monitor the DNS lookup requests with potential suspicious behavior. In this case, the monitoring infrastructure continuously reports the topk IP addresses with the largest number of requests at distributed servers in recent time. Since requests are frequent and rapid at distributed DNS servers, the solution of forwarding all requests to a central location and processing them is infeasible, which incurs huge communication overhead

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call