Abstract

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.

Highlights

  • Big data has received a momentum from both industry and academia

  • Message Passing Interface (MPI) is not suitable for big data applications, which would normally run for many hours during which some faults might happen

  • This paper presents a MapReduce based parallel backpropagation neural network (MRBPNN)

Read more

Summary

Introduction

Big data has received a momentum from both industry and academia. Many organizations are continuously collecting massive amounts of datasets from various sources such as the World Wide Web, sensor networks, and social networks. To fulfill the potentials of neural networks in big data applications, the computation process must be speeded up with parallel computing techniques such as the Message Passing Interface (MPI) [5, 6]. This paper presents a MapReduce based parallel backpropagation neural network (MRBPNN). The input dataset is segmented into a number of data chunks which are processed by mappers in parallel. In this scenario, each mapper builds the same BPNN classifier using the same set of training data. MRBPNN 2 focuses on a scenario in which the volume of the training data is large In this case, the training data is segmented into data chunks which are processed by mappers in parallel.

Related Work
Parallelizing Neural Networks
Performance Evaluation
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.