Parallel Classification Algorithm Design of Human Resource Big Data Based on Spark Platform

Wang Zhouhuo,Jian Su

doi:10.1155/2021/5811918

Abstract

In order to solve the problem of large data classification of human resources, a new parallel classification algorithm of large data of human resources based on the Spark platform is proposed in this study. According to the spark platform, it can complete the update and distance calculation of the human resource big data clustering center and design the big data clustering process. Based on this, the K-means clustering method is introduced to mine frequent itemsets of large data and optimize the aggregation degree of similar large data. A fuzzy genetic algorithm is used to identify the balance of big data. This study adopts the selective integration method to study the unbalanced human resource database classifier in the process of transmission, introduces the decision contour matrix to construct the anomaly support model of the set of unbalanced human resource data classifier, identifies the features of the big data of human resource in parallel, repairs the relevance of the big data of human resource, introduces the improved ant colony algorithm, and finally realizes the design of the parallel classification algorithm of the big data of human resource. The experimental results show that the proposed algorithm has a low time cost, good classification effect, and ideal parallel classification rule complexity.

Highlights

In order to solve the problem of large data classification of human resources, a new parallel classification algorithm of large data of human resources based on the Spark platform is proposed in this study
Yang and Zhu [6] propose an intelligent classification method of low occupancy big data under cloud computing. e Bayesian algorithm is used to Security and Communication Networks construct the intelligent classification model so that the fault tolerance can be minimized through the naive Bayesian intelligent classifier in the subsequent classification, and the compression function and feature selection are constructed to train the intelligent classification model with the same degree of discrimination as the source data and classify the features of the source data through the trained classification model
Taking the big data intelligent classification algorithm based on cloud computing proposed in reference [6], the big data classification algorithm based on big data feature selection proposed in reference [7], and the big data classification algorithm based on parallel language fuzzy rules proposed in reference [8] as the control group, the performances of different algorithms are compared through the analysis of experimental results

Summary

Big Data Clustering and Mining Based on Spark Platform

E longest iteration in the clustering process is completed in memory, which improves the data input and output efficiency. In the process of mining large data frequent itemsets, there is an a priori property that can be used to compress the search space. Any core frequent itemsets can be the original centroid of K-means [13, 14] In this algorithm, the threshold is adjusted to know the number of clusters. E main flow of improved K-means clustering mining large data frequent itemsets is as follows: Step 1: the FP-growth algorithm is used to command the generation of original cluster centroid and quantity. Step 2: the original cluster centroid and quantity are taken as the input of K-means, and the mining of large data frequent itemsets is completed

HR Big Data Parallel Classification Algorithm

Experimental Results and Analysis

Conclusions