SwGBDT: Efficient Gradient Boosted Decision Tree on Sunway Many-Core Processor

Bohong Yin,Zhongzhi Luan,Yunchun Li,Ming Dun,Depei Qian,Hailong Yang,Xin You

doi:10.1007/978-3-030-48842-0_5

Abstract

Gradient Boosted Decision Trees (GBDT) is a practical machine learning method, which has been widely used in various application fields such as recommendation system. Optimizing the performance of GBDT on heterogeneous many-core processors exposes several challenges such as designing efficient parallelization scheme and mitigating the latency of irregular memory access. In this paper, we propose swGBDT, an efficient GBDT implementation on Sunway processor. In swGBDT, we divide the 64 CPEs in a core group into multiple roles such as loader, saver and worker in order to hide the latency of irregular global memory access. In addition, we partition the data into two granularities such as block and tile to better utilize the LDM on each CPE for data caching. Moreover, we utilize register communication for collaboration among CPEs. Our evaluation with representative datasets shows that swGBDT achieves 4.6\(\times \) and 2\(\times \) performance speedup on average compared to the serial implementation on MPE and parallel XGBoost on CPEs respectively.

Highlights

In recent years machine learning has gained great popularity as a powerful technique in the field of big data analysis
We compare the performance of our swGBDT with serial implementation on Management Processing Element (MPE) and parallel XGBoost [3] on Computation Processing Elements (CPEs)
The serial implementation is the naive implementation of our Gradient Boosted Decision Tree (GBDT) algorithm without using CPEs

Summary

Introduction

In recent years machine learning has gained great popularity as a powerful technique in the field of big data analysis. Gradient Boosted Decision Tree (GBDT) [6] is a widely used machine learning technique for analyzing massive data with various features and sophisticated dependencies [17]. The GBDT is an ensemble machine learning model that requires training of multiple decision trees sequentially. The MPE is in charge of task scheduling whose structure is similar to mainstream processors, while CPEs are designed for high computing output with 16 KB L1 instruction caches and 64 KB programmable Local Device Memories (LDMs). There are two methods for memory access from main memory in the CG to a LDM in the CPE: DMA and global load/store (gld/gst). DMA is of much higher bandwidth compared to gld/gst for contiguous memory access. The SW26010 architecture introduces efficient and reliable register communication mechanism for communication between CPEs within the same row or column which has even higher bandwidth than DMA

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SwGBDT: Efficient Gradient Boosted Decision Tree on Sunway Many-Core Processor

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 1	License type: CC BY 4.0

Similar Papers

RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher
Chao Zhang ... John Shalf
-
Chao Zhang, et. al.Chao Zhang ... John Shalf
01 Oct 2020
01 Oct 2020

Exploring irregular memory accesses on FPGAs
Robert J Halstead ... Jason Villarreal
-
Robert J Halstead, et. al.Robert J Halstead ... Jason Villarreal
13 Nov 2011
13 Nov 2011

Managing Data Placement in Memory Systems with Multiple Memory Controllers
M Awasthi ... R Balasubramonian
International Journal of Parallel Programming | VOL. 40
M Awasthi, et. al.M Awasthi ... R Balasubramonian
07 Aug 2011
International Journal of Parallel Programming | VOL. 40

Multi-role SpTRSV on Sunway Many-Core Architecture
Mingzhen Li ... Depei Qian
-
Mingzhen Li, et. al.Mingzhen Li ... Depei Qian
01 Jun 2018
01 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SwGBDT: Efficient Gradient Boosted Decision Tree on Sunway Many-Core Processor

Abstract

Highlights

Summary

Talk to us

Similar Papers