Abstract

We present a new shared-memory parallel algorithm and implementation called FASCIA for the problems of approximate sub graph counting and sub graph enumeration. The problem of sub graph counting refers to determining the frequency of occurrence of a given sub graph (or template) within a large network. This is a key graph analytic with applications in various domains. In bioinformatics, sub graph counting is used to detect and characterize local structure (motifs) in protein interaction networks. Exhaustive enumeration and exact counting is extremely compute-intensive, with running time growing exponentially with the number of vertices in the template. In this work, we apply the color coding technique to determine approximate counts of non-induced occurrences of the sub graph in the original network. Color coding gives a fixed-parameter algorithm for this problem, using a dynamic programming-based counting approach. Our new contributions are a multilevel shared-memory parallelization of the counting scheme and several optimizations to reduce the memory footprint. We show that approximate counts can be obtained for templates with up to 12 vertices, on networks with up to millions of vertices and edges. Prior work on this problem has only considered out-of-core parallelization on distributed platforms. With our new counting scheme, data layout optimizations, and multicore parallelism, we demonstrate a significant speedup over the current state-of-the-art for sub graph counting.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call