Distributed random vector functional link network with subspace-based local connections

Wanguo Yu,Yulin He,Jiaqi Chen,Zhenhao Yuan

doi:10.3724/sp.j.1249.2022.06675

Wanguo Yu, Yulin He + Show 2 more

https://doi.org/10.3724/sp.j.1249.2022.06675

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In order to solve the problem of poor generalization ability and high computational complexity of random vector functional link (RVFL) network when dealing with large-scale data classification, we design and implement a distributed RVFL network with subspace-based local connections in Spark framework (DRVFL-SLC). Firstly, in order to take advantage of the partition parallelism of resilient distributed dataset (RDD), the large-scale dataset stored in the Hadoop distributed file system HDFS is randomly divided into random sample partition (RSP) data blocks and each RSP data block corresponds to a partition of the RDD, where the RSP data block is a subset of data that maintains probability distribution consistency with the big data at a given significance level. After that, the mapPartitions transformation is invoked on the RDD containing multiple partitions in a distributed environment and this operation trains the corresponding optimal RVFL-SLC efficiently in parallel. Then, the collect execution operator is used to efficiently fuse the optimal RVFL-SLC corresponding to each partition of the RDD to obtain DRVFL-SLC for realizing the classification of big data. Finally, the feasibility and effectiveness of DRVFL-SLC are verified based on several large-scale data set with at least million records on a Spark cluster deployed with 6 computing nodes. The results show that DRVFL-SLC has a good speedup ratio, scalability and scale growth, and can achieve better generalization performance than RVFL-SLC trained on a single machine with full data.

Full Text