A GPU-based accelerated ELM and deep-ELM training algorithms for traditional and deep neural networks classifiers

Arezoo Moradi Chegni,Behnam Ghavami,Mahdi Eftekhari

doi:10.1016/j.iswa.2022.200098

Arezoo Moradi Chegni, Behnam Ghavami + Show 1 more

https://doi.org/10.1016/j.iswa.2022.200098

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

• In this paper, we have proposed novel parallel ELM and D-ELM algorithms on the GPU. The benefits of these parallel algorithms include reducing the training time of the algorithms and reducing the runtime significantly. Also, remarkably predictive 95 accuracy is presented. To the best of our knowledge, this is the first kind of parallelization of the D-ELM in which the high computing power of graphics processing equipped with CUDA is utilized to parallelize. Our main contribution is as follows. • All sub-operations of ELM and D-ELM which need to be parallelized 100 are parallelized on GPU, thus benefiting from the memory hierarchy. In the meantime, several optimizations are adopted to improve the training time and performance of our parallelization. • Maximizing parallel performance: It is very important organizing the algorithm in computational blocks that are able to run independently. 105 Also, the communication between the blocks is minimized to achieve better performance. • Improve the usage of memory hierarchy: It is especially important to execution maximize the number of operations on data stored in shared memory. This strategy is applied to ELM and D-ELM 110 algorithms. • Using of CUDA Optimal Libraries: In some operations of ELM and D-ELM algorithms, these useful tools have been used for improving the performance. • Maximizing the independent and concurrent execution of the times 115 auto-encoders in the D-ELM Algorithm: Given that many computational operations are located in this section, their concurrent and independent execution is extraordinary for improving the performance. • The implementation of the standard of ELM and D-ELM has a weak 120 learning speed by increasing the scale of data. We have optimally implemented serial versions of ELM and D-ELM algorithms as a basis and brought the cost of training time for different strategies using data with different scales. Serial part evaluations show that the cost of training time for ELM and D-ELM algorithms and the size of 125 the dataset have almost linearly related. • Our proposed parallel algorithms have well training time and prediction accuracy and offer high performance for iterative calculations. Since they have high computing power so another variant of machine learning algorithms that have high iterative 130 computations can benefit from our proposed methods. The extreme learning machine (ELM) has been effectively used for training single-layer neural networks. In recent years, great attention has been paid to deep extreme learning machine (D-ELM) structures. Deep neural network structures are trained via the ELM method. Some stacked auto-encoders followed by a simple ELM layer can be usually used for solving classification and regression tasks. Although ELM has been employed for speeding up the training process of the neural network, D-ELM based models suffer from some issues such as the time complexity and running time. In this paper, we explore how the evaluation of ELM and D-ELM can be accelerated. GPUs are used to speed up the training process of ELM and D-ELM models. In the proposed method, three separate phases are considered for the algorithms. In the first phase, loading and pre-processing the data are performed serially in the CPU. In the second and third phases, which respectively are the training and testing phases of the algorithm, all the matrix operations of the algorithms are implemented in parallel mode using the GPU memory hierarchy. Also, having access to highly efficient computational libraries, additional support is provided for GPU-based parallel computing. In the simulation setup, five sets of the database are applied to train the ELM and D-ELM on both CPU and GPU platforms. The results obtained show the proposed approach based on GPUs can remarkable save running time. Although both serial and parallel methods measure approximately the same accuracy, the parallel methods provided for the models reduce the run time significantly.

Full Text