Iterative Hadoop MapReduce-Based Subgraph Enumeration in Network Motif Analysis

Wooyoung Kim,Vartika Verma,Paul Park Kwon

doi:10.1109/cloud.2015.122

Abstract

Finding network motifs in biological networks is a computationally intensive task as it involves traversing through a large network to enumerate all possible sub graphs of a given size, and then determining their statistical uniqueness by sampling sub graphs from a large number (more than 1,000) of random graph pools. There have been parallelization efforts in the past to mitigate the computational intensity for finding network motifs. However, they are either more for the frequent sub graphs in networks rather than network motifs, or require complex manipulations on message passing or designs. Additionally, most of the parallel algorithms are unavailable as tools for use. In this paper, we introduce a project of 'complete network motif parallelization.' This project aims to improve performance of serial algorithms for finding sub graphs in PPI networks - ESU algorithm and McKay's canonical labeling algorithm - by parallel zing them using iterative Hadoop MapReduce on Google Cloud, and then determining their uniqueness through explicit or direct random graph generation. In this paper, we describe the parallelization of ESU and McKay's canonical algorithms and present the experimental results with a significant improvement in performance, up to 37 times speedup. We are continuing to parallelize network motif significance as a next step, and expecting the completion of the project in near future.

Full Text