Multi-GPU approach to global induction of classification trees for large-scale data mining

Krzysztof Jurczuk,Marcin Czajkowski,Marek Kretowski

doi:10.1007/s10489-020-01952-5

Krzysztof Jurczuk, Marcin Czajkowski + Show 1 more

Open Access

https://doi.org/10.1007/s10489-020-01952-5

Copy DOI

Journal: Applied Intelligence	Publication Date: Jan 14, 2021
Citations: 17	License type: open-access

Affiliation: Bialystok University of Technology

Abstract

This paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Highlights

IntroductionThe term Big Data has become extremely popular in business, industry, and science [14]
In the last decade, the term Big Data has become extremely popular in business, industry, and science [14]
We have focused on graphics processing units (GPUs) to provide parallelization of evolutionary decision trees (DT) induction

Summary

Introduction

The term Big Data has become extremely popular in business, industry, and science [14] It refers to storing and handling of large or complex datasets that are almost impossible to manage using traditional tools. Recent involvement of evolutionary algorithms (EAs) into the trees induction [2] can be seen as a breath of fresh air. Their main advantage is the global approach in which a tree structure, tests in internal nodes and predictions in leaves are searched simultaneously [38]. The generated trees are significantly simpler and at least as accurate as the greedy alternatives

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-GPU approach to global induction of classification trees for large-scale data mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Intelligence

Lead the way for us

Similar Papers

GPU-based acceleration of evolutionary induction of model trees
Krzysztof Jurczuk ... Marek Kretowski
Applied Soft Computing | VOL. 119
Krzysztof Jurczuk, et. al.Krzysztof Jurczuk ... Marek Kretowski
29 Jan 2022
Applied Soft Computing | VOL. 119

GPU-Accelerated Evolutionary Induction of Regression Trees
Krzysztof Jurczuk ... Marcin Czajkowski
-
Krzysztof Jurczuk, et. al.Krzysztof Jurczuk ... Marcin Czajkowski
01 Jan 2017
01 Jan 2017

Accelerating GPU-based Evolutionary Induction of Decision Trees - Fitness Evaluation Reuse
Krzysztof Jurczuk ... Marek Kretowski
-
Krzysztof Jurczuk, et. al.Krzysztof Jurczuk ... Marek Kretowski
01 Jan 2020
01 Jan 2020

Evolutionary Induction of Cost-Sensitive Decision Trees
Marek Krętowski ... Marek Grześ
-
Marek Krętowski, et. al.Marek Krętowski ... Marek Grześ
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-GPU approach to global induction of classification trees for large-scale data mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Intelligence