Machine learning‐based load distribution and balancing in heterogeneous database management systems

Anes Abdennebi,Erdinç Öztürk,Anıl Elakaş,Fatih Taşyaran,Kamer Kaya,Sinan Yıldırım

doi:10.1002/cpe.6641

Abstract

SummaryFor dynamic and continuous data analysis, conventional OLTP systems are slow in performance. Today's cutting‐edge high‐performance computing hardware, such as GPUs, has been used as accelerators for data analysis tasks, which traditionally leverage CPUs on classical database management systems (DBMS). When CPUs and GPUs are used together, the architectural heterogeneity, that is, leveraging hardware with different performance characteristics jointly, creates complex problems that need careful treatment for performance optimization. Load distribution and balancing are crucial problems for DBMSs working on heterogeneous architectures. In this work, focusing on a hybrid, CPU‐GPU database management system to process users' queries, we propose heuristical and machine‐learning‐based (ML‐based) load distribution and balancing models. In more detail, we employ multiple linear regression (MLR), random forest (RF), and Adaboost (Ada) models to dynamically decide the processing unit for each incoming query based on the response time predictions on both CPU and GPU. The ML‐based models outperformed the other algorithms, as well as the CPU and GPU‐only running modes with up to 27%, 29%, and 40%, respectively, in overall performance (response time) while answering intense real‐life working scenarios. Finally, we propose to use a hybrid load‐balancing model that would be more efficient than the models we tested in this work.

Full Text