Quantifying energy consumption of database operations is the foundation of building energy-efficient database systems. Existing approaches only focused on stand-alone database servers, while we are interested in modeling energy consumption of database operations in the distributed environments. In this study, we aim at providing an accurate energy consumption model for queries executed in distributed database systems, profiling energy consumption characteristics of both the individual queries and the system as a whole, to guide the design of green database systems and to reveal opportunities for energy-efficient computing. As the execution of a distributed query is a combination of a set of subqueries that decomposed from it, we start from building energy models for individual subqueries by extracting basic operations that can effectively reflect their energy consumption. Then we use a bottom-up measuring and modeling method as the basis to provide a comprehensive energy estimation model for the entire distributed query. To validate the accuracy of the model, we use queries with a variety complexity that generated from three standard benchmarks (TPC-H, SSB, and Sysbench) on real distributed databases. Extensive experimental results show that our solution can achieve a high average accuracy of 94.63% for distributed queries. More importantly, based on these results we further explored energy consumption patterns for distributed queries and presented several important implications. And finally, a significant role of distributed joins has been discovered in saving both idle and dynamic energy cost of the system. We hope that taking advantage of our observations can help readers who wish to substantially improve energy efficiency for distributed database systems.
Read full abstract