Abstract

Evolutionary algorithms (EAs) are naturally prone to parallel processing. However, when they are applied to data mining, the fitness calculations start to dominate and the typical population-based decomposition limits the parallel efficiency. When dealing with large-scale data, the scalable solution may become a real challenge. In this article, we propose a GPU-based parallelization of evolutionary induction of model trees. Such trees are a special case of decision tree (DT) that is designed to solve regression problems. The evolutionary approach allows not only a robust prediction but also to preserve the simplicity of DTs. However, the global approach is much more computationally demanding than state-of-the-art greedy inducers, and thus hard to apply to large-scale data mining directly. A parallelized induction of model trees (with univariate tests in the internal nodes and multiple linear regression models in the leaves) requires a carefully designed decomposition strategy. Six GPU-supported procedures are designed to successively: redistribute, sort and rearrange dataset samples, next, calculate models and fitness, and finally gather the results. Experimental validation is performed on real-life and artificial datasets, using various (low- and high-end) GPU accelerators. Results show that the GPU-supported solution enables time-efficient global induction of model trees on large-scale data, which until now was reserved for greedy methods. The obtained speedup is very satisfactory (even up to hundreds of times). The solution is scalable for datasets of different sizes and dimensions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call