Parallelized Similarity Flooding Algorithm for Processing Large Scale Graph Datasets with MapReduce

Jian Zhang,Chunfeng Yuan,Yihua Huang

doi:10.1109/pdcat.2012.109

Abstract

Measures of graph similarity have a broad range of applications but involve compute-intensive process. Similarity flooding algorithm is an efficient algorithm for comparing the similarity of graphs of small size and small datasets. However, nowadays more and more large-scale graph applications emerge and existing stand-alone similarity flooding algorithm cannot efficiently conduct the similarity comparison process for large scale graph datasets in acceptable time. This paper presents a parallelized similarity flooding algorithm with MapReduce for large-scale graph datasets. The experimental results demonstrate that the parallelized algorithm achieves significant performance improvement compared to the stand-alone similarity flooding algorithm. Experimental results also reveal that the parallelized algorithm can obtain excellent speedup when the size of cluster increases.

Full Text