Abstract

Effective Big Data analytics need to rely on algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous network devices needs to be continuously collected and analyzed for interesting trends and real-time reaction to different scenarios (e.g., hotspots or DDoS attacks). In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying communication infrastructure. This course will provide an overview of some key algorithmic tools for supporting effective, real-time analytics over streaming data. Our primary focus will be on small-space sketch synopses for approximating continuous data streams, and their applicabilty in both centralized and distributed settings.Syllabus:1. Introduction and Motivation2. Data Streaming Models and Mathematical Tools3. Basic Algorithmic Tools for Data Streams•Reservoir Sampling•Bag Synopses: AMS and CountMin Sketches•Set Synopses: FM Sketches and Distinct Sampling4. The Sliding Window Model and Exponential Histograms5. Distributed Data Streaming•Basic Models and Techniques•The Geometric Method and Convex Safe Zones6. Conclusions and Looking Forward7. Hands-on Experience with Streaming Tools

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call