Data Streams

João Gama,Pedro Pereira Rodrigues

doi:10.4018/978-1-60566-010-3.ch088

Abstract

Nowadays, data bases are required to store massive amounts of data that are continuously inserted, and queried. Organizations use decision support systems to identify potential useful patterns in data. Data analysis is complex, interactive, and exploratory over very large volumes of historic data, eventually stored in distributed environments. What distinguishes current data sources from earlier ones are the continuous flow of data and the automatic data feeds. We do not just have people who are entering information into a computer. Instead, we have computers entering data into each other (Muthukrishnan, 2005). Some illustrative examples of the magnitude of today data include: 3 billion telephone calls per day, 4 Giga Bytes of data from radio telescopes every night, 30 billion emails per day, 1 billion SMS, 5 Giga Bytes of Satellite Data per day, 70 billion IP Network Traffic per day. In these applications, data is modelled best not as persistent tables but rather as transient data streams. In some applications it is not feasible to load the arriving data into a traditional Data Base Management Systems (DBMS), and traditional DBMS are not designed to directly support the continuous queries required in these applications (Alon et al., 1996; Babcock et al. 2002; Cormode & Muthukrishnan, 2003). These sources of data are called Data Streams. Computers play a much more active role in the current trends in decision support and data analysis. Data mining algorithms search for hypothesis, evaluate and suggest patterns. The pattern discovery process requires online ad-hoc queries, not previously defined, that are successively refined. Due to the exploratory nature of these queries, an exact answer may be not required: a user may prefer a fast but approximate answer to a exact but slow answer. Processing queries in streams require radically different type of algorithms. Range queries and selectivity estimation (the proportion of tuples that satisfy a query) are two illustrative examples where fast but approximate answers are more useful than slow and exact ones. Approximate answers are obtained from small data structures (synopsis) attached to data base that summarize information and can be updated incrementally

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data Streams

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Special track on Data Streams
João Gama ... Andre Carvalho
-
João Gama, et. al.João Gama ... Andre Carvalho
16 Mar 2008
16 Mar 2008

Learning from Data Streams
João Gama ... Pedro Pereira Rodrigues
-
João Gama, et. al.João Gama ... Pedro Pereira Rodrigues
01 Jan 2009
01 Jan 2009

Adaptive Query Processing on Raw Data Files

-

01 Jan 2015
01 Jan 2015

Models and Techniques for Approximate Queries in OLAP
Alfredo Cuzzocrea
-
Alfredo CuzzocreaAlfredo Cuzzocrea
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data Streams

Abstract

Talk to us

Similar Papers