Abstract

Predicate pushdown is a widely adopted query optimization. Existing systems and prior work mostly use pattern-matching rules to decide when a predicate can be pushed through certain operators like join or groupby. However, challenges arise in optimizing for data science pipelines due to the widely used non-relational operators and user-defined functions (UDF) that existing rules would fail to cover. In this paper, we present MagicPush, which decides predicate pushdown using a search-verification approach.MagicPush searches for candidate predicates on pipeline input, which is often not the same as the predicate to be pushed down, and verifies that the pushdown does not change pipeline output with full correctness guarantees. Our evaluation on TPC-H queries and 200 real-world pipelines sampled from GitHub Notebooks shows that MagicPush substantially outperforms a strong baseline that uses a union of rules from prior work - it is able to discover new pushdown opportunities and better optimize 42 real-world pipelines with up to 99% reduction in running time, while discovering all pushdown opportunities found by the existing baseline on remaining cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call