Accelerating Joins and Aggregations on the Oracle In-Memory Database

Shasank Chavan,Ajit Mylavarapu,Albert Hopeman,Ekrem Soylemez,Sangho Lee,Dennis Lui

doi:10.1109/icde.2018.00163

Abstract

OLAP and real-time analytic workloads in data management systems are dominated by joins, aggregations, scan and filtering costs. In-Memory columnar databases have successfully optimized scans by many orders of magnitude using compressed data formats and SIMD vectorization techniques, but have largely made little impact to the rest of the query execution plan. The Oracle Database In-Memory (DBIM) Option introduced new SQL execution operators that accelerate a wide range of analytic queries, delivering orders of magnitude performance improvement by optimizing aggregation over joins for star and similar schemas. Group-by expressions are pushed down into the scans of dimension tables, creating a unique key per distinct group called a Dense Grouping Key (DGK). A structure called a Key Vector is allocated that maps join keys to DGKs, which is used to filter non-matching rows during the fact table scan. Passing rows are then aggregated directly on compressed codes into DGK-indexed result buffers using SIMD and other novel aggregation techniques. Our innovative solution replaces traditional join and group-by processing (bloom filters, hash table build and probe, serial aggregation) with blazing fast inlined scan operators. And with DBIM's unique dual-format architecture, DML activity (inserts, updates, deletes) do not dampen the phenomenal gains we see with our solution. Using a set of aggregation-heavy queries against the Star Schema Benchmark (SSB) schema, we show that our technique can drastically reduce query elapsed time by more than 10x, making real-time analytics truly achievable.

Full Text