Abstract
In relational databases stored procedures and user defined functions (UDFs) have been used to express application business logic by using control flow logic and DMLs. Recently their use is increasing with the rising popularity of real-time analytics applications since the applications often contain complex business logic. Thus hybrid transactional and analytical processing systems like SAP HANA have started to put more efforts to optimize the execution of UDFs. There has been much work in the fields of program optimization and query optimization, respectively. However, most of the studies have been done separately in isolated worlds, and cross-optimizations breaking the boundary between declaratives and imperatives were not studied enough. Therefore, unified optimization techniques considering both program and query optimization techniques are essential for achieving optimal query performance. The first part of this talk presents a literature overview of a successful approach to improve the performance of UDF execution. This approach transforms entire UDFs into equivalent relational algebra expressions, then applies existing relational query optimization techniques. It is interesting that the imperative constructs such as branches and loops can be converted into relational algebra expressions. This approach is attractive since we can simply take advantage of existing sophisticated query optimization techniques after the transformation. However, one of the most challenging issues faced by this approach is that transformation rules are currently limited and there are non-transformable imperative expressions. The second part presents our past and ongoing work using another approach, which unifies both program and query optimization techniques in a framework. The framework consists of plan enumeration and cost estimation for the UDFs. We first demonstrate this approach with iterative query processing in a UDF. With the notion of query motion, by which an SQL query is moved in and out of a loop, we enumerate execution plans for the UDF. We choose the best plan by using a cost model which measures the procedure cost based on the cost estimation of the query optimizer and imperative constructs. We next discuss unified optimization using the concept of relational operator motion, where we pull up or push down relational operators among statements in the UDF to obtain a globally optimized plan of the UDF.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have