Heterogeneous data mining algorithm of Internet of things based on Spark of biotech and artificial intelligence architecture

Chen Bing Chen Bing,Zhang Ting

doi:10.5912/jcb1040

Abstract

Due to the limitation of memory, heterogeneous data mining algorithm of Internet of things takes too long to process large-scale data of biotech companies. This paper proposes a heterogeneous data mining algorithm for Internet of things based on Spark artificial intelligence architecture. The heterogeneous data is abstracted into the characteristic value of the unified dimension related to the target task, the attribute is established and the preprocessing result is output. A distributed parallel computing framework is established by using Spark artificial intelligence architecture, and the preprocessed data is distributed to different execution points. The data of each execution point is converted into bit matrix to calculate the attribute association degree, and the data is partitioned according to the association result. According to association rules of biotech companies, local and global frequent pattern trees are generated to complete heterogeneous data mining. The experimental results show that the execution time of the heterogeneous data mining algorithm designed in this paper is 1892s, which is 1147s, 1991s and 624s less than HUIM-ACO algorithm, FP-Growth algorithm and ENFP-Growth algorithm, respectively. The algorithm designed in this paper can effectively shorten data processing time and improve the execution efficiency.

Full Text