Advancements in information technology is contributing to the excessive rate of big data generation recently. Big data refers to datasets that are huge in volume and consumes much time and space to process and transmit using the available resources. Big data also covers data with unstructured and structured formats. Many agencies are currently subscribing to research on big data analytics owing to the failure of the existing data processing techniques to handle the rate at which big data is generated. This paper presents an efficient classification and reduction technique for big data based on parallel generalized Hebbian algorithm (GHA) which is one of the commonly used principal component analysis (PCA) neural network (NN) learning algorithms. The new method proposed in this study was compared to the existing methods to demonstrate its capabilities in reducing the dimensionality of big data. The proposed method in this paper is implemented using Spark Radoop platform.
Read full abstract