Performing in-situ analytics: Mining frequent patterns from big IoT data at network edge with D-HARPP

Muhammad Yasir,Ali Haidar,Muhammad Umar Chaudhry,Muhammad Asif Habib,Aamir Hussain,Elżbieta Jasińska,Zbigniew Leonowicz,Michał Jasiński

doi:10.1016/j.engappai.2022.105480

Abstract

Big IoT data is inherently distributed, high-dimensional, irregular, and sparse in nature. Fog computing model in its original form is by no means the optimal solution for mining big IoT data. However, utilizing the network edge for mining tasks, such as enabling edge and IoT devices to mine locally frequent patterns can significantly improve the mining performance. Additionally, edge devices capable of performing distributed job processing could utilize the model to the fullest. But resource poorness of edge and IoT devices needs lightweight pattern mining algorithms. This paper presents Distributed HARnessing the Power of Powersets for Mining Frequent Itemsets (D-HARPP), a spark-based distributed algorithm to mine frequent co-occurring itemsets in big IoT data. Unlike state-of-the-art distributed algorithms, D-HARPP makes a single pass over the data and does not create candidate itemsets; thus, achieves significantly better runtime and consumes the least memory. Moreover, performance of D-HARPP is not deteriorated at lower minimum support thresholds. These distinguishing characteristics make D-HARPP an optimal choice for Spark-enabled edge and IoT devices. D-HARPP has outperformed Spark-Apriori, another distributed algorithm by significant margins, both in terms of runtime and memory consumption, particularly on sparse datasets.

Full Text