CmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining

Shunyun Yang,Shaoliang Peng,Rui Liu,Xiangke Liao,Runxin Guo,Quan Zou,Benyun Shi

doi:10.1186/s12859-018-2071-z

Shunyun Yang, Shaoliang Peng + Show 5 more

Open Access

https://doi.org/10.1186/s12859-018-2071-z

Copy DOI

Abstract

BackgroundFrequent subgraphs mining is a significant problem in many practical domains. The solution of this kind of problem can particularly used in some large-scale drug molecular or biological libraries to help us find drugs or core biological structures rapidly and predict toxicity of some unknown compounds. The main challenge is its efficiency, as (i) it is computationally intensive to test for graph isomorphisms, and (ii) the graph collection to be mined and mining results can be very large. Existing solutions often require days to derive mining results from biological networks even with relative low support threshold. Also, the whole mining results always cannot be stored in single node memory.ResultsIn this paper, we implement a parallel acceleration tool for classical frequent subgraph mining algorithm called cmFSM. The core idea is to employ parallel techniques to parallelize extension tasks, so as to reduce computation time. On the other hand, we employ multi-node strategy to solve the problem of memory constraints. The parallel optimization of cmFSM is carried out on three different levels, including the fine-grained OpenMP parallelization on single node, multi-node multi-process parallel acceleration and CPU-MIC collaborated parallel optimization.ConclusionsEvaluation results show that cmFSM clearly outperforms the existing state-of-the-art miners even if we only hold a few parallel computing resources. It means that cmFSM provides a practical solution to frequent subgraph mining problem with huge number of mining results. Specifically, our solution is up to one order of magnitude faster than the best CPU-based approach on single node and presents a promising scalability of massive mining tasks in multi-node scenario. More source code are available at:Source Code: https://github.com/ysycloud/cmFSM.

Highlights

Frequent subgraphs mining is a significant problem in many practical domains
We used Many Integrated Core (MIC) in offload mode only to transfer double-edge frequent subgraphs and back up complex data structures redundantly to avoid the bottlenecks caused by excessive transmission
We have evaluated the performance of cmFSM under five aspects: (i) parallelization on single node, (ii) multinode division strategy, (iii) efficiency of multi-node multi-thread acceleration, (iv) CPU/MIC collaboration and (v) multi-node CPU/MIC collaboration

Summary

Introduction

Frequent subgraphs mining is a significant problem in many practical domains. The solution of this kind of problem can used in some large-scale drug molecular or biological libraries to help us find drugs or core biological structures rapidly and predict toxicity of some unknown compounds. Yang et al BMC Bioinformatics 2018, 19(Suppl 4): second case is usually adapted to the areas of computational pharmacology and bioinformatics. Large data input size with relative low support threshold can lead to huge number of mining results, which may exceed the memory of a single machine, and require vast amounts of runtime. Given these characteristics, parallel techniques are presented as a promising solution to solve these challenges. We mainly focus on the second case, which is more practical in the field of bioinformatics and known as transaction setting [11]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 1, 2018
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

CmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A distributed approach to weighted frequent Subgraph mining
Nisha Babu ... Ansamma John
-
Nisha Babu, et. al.Nisha Babu ... Ansamma John
01 Oct 2016
01 Oct 2016

Ap-FSM: A parallel algorithm for approximate frequent subgraph mining using Pregel
Vandana Bhatia ... Rinkle Rani
Expert Systems With Applications | VOL. 106
Vandana Bhatia, et. al.Vandana Bhatia ... Rinkle Rani
09 Apr 2018
Expert Systems With Applications | VOL. 106

MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining
Prakash Shelokar ... Óscar Cordón
Knowledge and Information Systems | VOL. 34
Prakash Shelokar, et. al.Prakash Shelokar ... Óscar Cordón
17 Nov 2011
Knowledge and Information Systems | VOL. 34

A Projection Bias in Frequent Subgraph Mining Can Make a Difference
Brahim Douar ... Yahya Slimani
International Journal on Artificial Intelligence Tools | VOL. 23
Brahim Douar, et. al.Brahim Douar ... Yahya Slimani
01 Oct 2014
International Journal on Artificial Intelligence Tools | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics