Abstract

Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.

Highlights

  • Classification is one of the most effective approaches in enabling the analysis of the digital data in quite a number of academia and research fields, for example, the medical researches [1,2,3,4,5,6] and the power system researches [7,8,9,10,11,12]

  • In order to implement the large-scale data classification, the Hadoop framework based on MapReduce computing model [51] is employed to parallelize the improved back propagation neural network (BPNN). is paper firstly separates the entire training dataset into a number of data chunks which are saved in HDFS (Hadoop Distributed File System), and each participated mapper initializes one sub-BPNN and inputs one data chunk, respectively

  • In order to serve the classifications for large-scale data, this paper presents a parallelized improved BPNN algorithm

Read more

Summary

Research Article

Received December 2019; Revised February 2020; Accepted 5 May 2020; Published 18 May 2020. Erefore, this paper presents an improved artificial neural network in enabling the highperformance classification for the imbalanced large volume data. The Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. The ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. E parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. E experimental results show the effectiveness of the presented classification algorithm Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. e improvements for the input layer and hidden layer enhance the training performances in terms of convergence. e parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. e experimental results show the effectiveness of the presented classification algorithm

Introduction
Scientific Programming
Input layer Hidden layer
ReLU b
Weighted voting
Data block i
Class balancing algorithm
Testing instance number
Training error
Minimum accuracy Maximum accuracy
Batch size
Parallelized LSTM Standalone BPNN Parallelized BPNN
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call