Interactive applications require processing tens to hundreds of concurrent analytical queries within tight time constraints. In such setups, where high concurrency causes contention, work-sharing databases are critical for improving scalability and for bounding the increase in response time. However, as such databases share data access using full scans and expensive shared filters, they suffer from a data-access bottleneck that jeopardizes interactivity. We present SH2O: a novel data-access operator that addresses the data-access bottleneck of work-sharing databases. SH2O is based on the idea that an access pattern based on judiciously selected multidimensional ranges can replace a set of shared filters. To exploit the idea in an efficient and scalable manner, SH2O uses a three-tier approach: i) it uses spatial indices to efficiently access the ranges without overfetching, ii) it uses an optimizer to choose which filters to replace such that it maximizes cost-benefit for index accesses, and iii) it exploits partitioning schemes and independently accesses each data partition to reduce the number of filters in the access pattern. Furthermore, we propose a tuning strategy that chooses a partitioning and indexing scheme that minimizes SH2O's cost for a target workload. Our evaluation shows a speedup of 1.8-22.2 for batches of hundreds of data-access-bound queries.
Read full abstract