Adaptive Query Processing on Raw Data Files

Ioannis Alagiannis

doi:10.5075/epfl-thesis-6644

Abstract

Nowadays, business and scientific applications accumulate data at an increasing pace. This growth of information has already started to outgrow the capabilities of database management systems (DBMS). In a typical DBMS usage scenario, the user should define a schema, load the data and tune the system for an expected workload before submitting any queries. Copying data into a database is a significant investment in terms of time and resources, and in many cases unnecessary or even no longer feasible in practice due to the explosive data growth. Additionally, the way DBMS store and organize data during data loading defines how data will be accessed for a given workload and thus, the maximum performance. Selecting the underlying data layout (row-store or column-store) is a critical first tuning decision which cannot change. Nevertheless, today query analysis is not static; it evolves as queries change. Hence, static design decisions can be suboptimal. In this thesis, we advocate in situ query processing as the principal way to manage data in a database. We reconsider the data loading phase and redesign traditional query processing architectures to work efficiently over raw data files to address the heavy initialization cost that comes with data loading. We present adaptive data loading as an alternative to traditional full a priori data loading. We explore the potential of in situ query processing in the context of current DBMS architectures. We identify performance bottlenecks specific for in situ processing and we introduce an adaptive indexing mechanism (positional map) that maintains positional information to provide efficient access to raw data files, together with a flexible caching structure and techniques for collecting statistics over raw data files. Moreover, we design a flexible query engine that is not built around a single storage layout but it can exploit different storage layouts and data execution strategies in a single engine. It decides during query processing, which design fits the input queries and properly adapts the underlying data storage. By applying code generation techniques, we dynamically generate access operators tailored for specific classes of queries. This thesis revises the traditional paradigm of loading, tuning and then querying by using in situ query processing as the principal way to minimize data-to-query time. We show that raw data files should not be considered ``outside'' the DBMS and full data loading should not be a requirement to exploit database technology. On the contrary, proper techniques specifically tailored to overcome limitations that come with accessing raw data files can eliminate the data loading overhead making, therefore, raw data files a first-class citizen, fully integrated with the query engine. The proposed roadmap can provide guidance on how to convert any traditional DBMS into an efficient in situ query engine.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptive Query Processing on Raw Data Files

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Analyzing related raw data files through dataflows
Vítor Silva ... Patrick Valduriez
Concurrency and Computation: Practice and Experience | VOL. 28
Vítor Silva, et. al.Vítor Silva ... Patrick Valduriez
04 Aug 2015
Concurrency and Computation: Practice and Experience | VOL. 28

Exploratory Analysis of Raw Data Files through Dataflows
Vitor Silva ... Marta Mattoso
-
Vitor Silva, et. al.Vitor Silva ... Marta Mattoso
01 Oct 2014
01 Oct 2014

Towards a Combined Grouping and Aggregation Algorithm for Fast Query Processing in Columnar Databases with GPUs
Sina Meraji ... John Keenleyside
-
Sina Meraji, et. al.Sina Meraji ... John Keenleyside
01 May 2015
01 May 2015

Research data supporting "Laser-Induced Reduction and In-situ Optical Spectroscopy of Individual Plasmonic Copper Nanoparticles for Catalytic Reactions"
...
-
, et. al. ...
13 Feb 2017
13 Feb 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive Query Processing on Raw Data Files

Abstract

Talk to us

Similar Papers