Abstract

Due to the increasing demand for scalable and interactive data analytics, column stores have become the de-facto choice in many analytical databases. As a common and fundamental operation in column stores, expression evaluation has a remarkable effect on many queries. To speed up expression evaluation, vectorized techniques such as Single-Instruction-Multiple-Data (SIMD) instructions are widely used. However, there are few works concerning dedicated optimizations for SIMD-based expression evaluation for column stores. In this paper, we propose a runtime optimization framework named ROVEC that enables effective optimizations for SIMD-based expression evaluation. The key idea is to optimize logical expression at execution time, by leveraging lightweight compression and fine-grained statistics associated with the compressed data. ROVEC removes unnecessary type casting and finds the tightest type during evaluation, which maximizes the concurrent operands in SIMD instructions. ROVEC can be applied to many expression-evaluation-intensive operators (e.g., table scan and theta join) for different data types (e.g., numeric, time and string). To validate the effectiveness of ROVEC, we integrate it into a columnar database PolarDB-C. Our evaluation results show that ROVEC improves up to 120% (60% on average) throughput of table scan and up to 50% (30% on average) latency of theta join.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call