Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training

Zeyi Wen,Kotagiri Ramamohanarao,Jian Chen,Qinbin Li,Jiashuai Shi,Bingsheng He

doi:10.1109/tpds.2019.2920131

Abstract

In this paper, we present a novel parallel implementation for training Gradient Boosting Decision Trees (GBDTs) on Graphics Processing Units (GPUs). Thanks to the excellent results on classification/regression and the open sourced libraries such as XGBoost, GBDTs have become very popular in recent years and won many awards in machine learning and data mining competitions. Although GPUs have demonstrated their success in accelerating many machine learning applications, it is challenging to develop an efficient GPU-based GBDT algorithm. The key challenges include irregular memory accesses, many sorting operations with small inputs and varying data parallel granularities in tree construction. To tackle these challenges on GPUs, we propose various novel techniques including (i) Run-length Encoding compression and thread/block workload dynamic allocation, (ii) data partitioning based on stable sort, and fast and memory efficient attribute ID lookup in node splitting, (iii) finding approximate split points using two-stage histogram building, (iv) building histograms with the aware of sparsity and exploiting histogram subtraction to reduce histogram building workload, (v) reusing intermediate training results for efficient gradient computation, and (vi) exploiting multiple GPUs to handle larger data sets efficiently. Our experimental results show that our algorithm named ThunderGBM can be 10x times faster than the state-of-the-art libraries (i.e., XGBoost, LightGBM and CatBoost) running on a relatively high-end workstation of 20 CPU cores. In comparison with the libraries on GPUs, ThunderGBM can handle higher dimensional problems which the libraries become extremely slow or simply fail. For the data sets the existing libraries on GPUs can handle, ThunderGBM achieves up to 10 times speedup on the same hardware, which demonstrates the significance of our GPU optimizations. Moreover, the models trained by ThunderGBM are identical to those trained by XGBoost, and have similar quality as those trained by LightGBM and CatBoost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Dec 1, 2019
Citations: 67

Similar Papers

Efficient Gradient Boosted Decision Tree Training on GPUs
Zeyi Wen ... Bingsheng He
-
Zeyi Wen, et. al.Zeyi Wen ... Bingsheng He
01 May 2018
01 May 2018

Practical Federated Gradient Boosting Decision Trees
Qinbin Li ... Zeyi Wen
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Qinbin Li, et. al.Qinbin Li ... Zeyi Wen
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Machine Learning Using Virtualized GPUs in Cloud Environments
Uday Kurkure ... Lan Vu
-
Uday Kurkure, et. al.Uday Kurkure ... Lan Vu
01 Jan 2017
01 Jan 2017

SHREG: Mitigating register redundancy in GPUs
Seunghyun Jin ... Won Woo Ro
Journal of Systems Architecture | VOL. 152
Seunghyun Jin, et. al.Seunghyun Jin ... Won Woo Ro
18 Apr 2024
Journal of Systems Architecture | VOL. 152

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems