Abstract

Many real world data naturally arrive as rapid paced and virtually unbounded streams. Examples of such streams include network traffic at a router, events observed by a sensor network, accesses to a web server and transactional updates to a large database. Such streaming data need to be monitored online to collect traffic statistics, detect trends and anomalies, tune system performance and help make business decisions. However, because of the large size and rapid pace of the data, as well as the online processing requirement, conventional data processing methods, such as storing the data in a database and issuing offline SQL queries thereafter, are not feasible. Data stream processing is a new diagram of massive data set processing and creates new challenges in the algorithm design and implementation. In this thesis, we consider time-decayed data aggregation for data streams, where the importance or contribution of each data element decays over time, since recent data are usually considered of more importance in applications, and therefore are given heavier weights. We design small space data structures and algorithms for maintaining fundamental aggregates of the streams if it is possible and otherwise show large space lower bounds.We consider the data aggregation over a robust data stream model called asynchronous data stream, motivated by the streaming data transmitted in distributed systems, including computer networks, where the asynchrony in the data transmission is inevitable. In asynchronous data stream, the arrival order of the data elements at the receiver side is not necessarily the same as the order in which the data elements were generated. Asynchronous data stream is a robuster and generalized model of the previous synchronous data stream model. In summary, this thesis presents the following results: • We formalize the model of asynchronous data stream and the notion of timestamp sliding window. We propose the first small space sketch for summarizing the data elements over timestamp sliding windows of multiple geographically distributed asynchronous data streams. The sketch can return accuracy guaranteed estimates for basic aggregates, such as: Sum, Median and Quan-

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.