Abstract

Many database applications, such as OnLine Analytical Processing (OLAP), web-based information extraction or scientific computation, need to select a subset of fields based on several user-defined filters. Developers of these applications require effective assembly methods for on-demand filtering and aggregation, which raises new challenges in deploying parallel computing components on top of columnar storage. To efficiently generate qualified records, an asynchronous skipping strategy is presented to speed up filtering and decoding in the column-based storage. Concentrating on filtering-pushdown in parallel analytical workloads, we offer in-depth analysis on record assembly. We highlight the bottleneck of traditional record-wise assembling methods in the cases of evaluating analytical tasks on a nested schema. With a concurrent queue structure, an asynchronous skipping strategy is presented to evaluate column scan separately by a software pipeline involving a different set of threads. We show how to intensively read the sequential blocks of each column, and how to effectively eliminate invalid payloads by integrating filtering-pushdown in an asynchronous I/O stack. We implement a columnar storage supporting filtering-pushdown in nested schema. Our experiments are conducted on a de-facto standard benchmark using both variant-selectivity scans and ad-hoc queries. The results revealed that in parallel I/O-intensive workloads, our implementation improved the I/O performance of the state-of-the arts by 1.3X∼2.7X. Coupling the asynchronous strategy with filtering-pushdown, our implementation remarkably outperforms its competitors with heavyweight coding workloads on both HDD and SSD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call