HE-Gaston algorithm for frequent subgraph mining with hadoop framework

D.B Jagannadha Rao,Vijayakumar Polepally,Parsi Kalpana,S Nagendra Prabhu

doi:10.1016/j.eswa.2024.123971

Abstract

Graph mining contributes a key role in data mining and as the size of the data increases, it becomes complicated. Identifying the interesting subgraphs in the graph is a commonly researched issue, where the subgraphs denote the commonly occurring pattern exhibiting a particular structure. Frequent Subgraph Mining (FSM) is an important task for exploratory data analysis on graph data.Though many techniques are proposed for FSM, the large dimension of the data makes FSM complex. This research proposes a novel technique for performing FSM in a Hadoop framework. Here, FSM is carried out using the proposed Holoentropy Gaston algorithm (HE-Gaston), which is developed by incorporating the Holoentropy support measure instead of the Recurrent support measure in the Recurrent-Gaston (R-Gaston) technique. Here, the weblog files are considered for FSM and are fed to the Spark framework, which encompasses a master and several slaves. The slave nodes generate the frequent subgraphs based on the Holoentropy support measure and the generated frequent subgraphs are applied to the master node which produces the final frequent subgraphs by utilizing an aggregate Holoentropy support measure. Further, the HE-Gaston shows that it recorded execution time of 43 ms, memory of 2.161 MB, and number of structures mined as 117.

Full Text