Abstract

In this paper, we propose a parallel algorithm for mining large maximal bicliques from graph datasets. We propose POP-MBC (Parallel Order Preserving Maximal BiClique mining algorithm), a fast and memory efficient parallel algorithm, which enumerates all the maximal bicliques independently and concurrently across several processors without any synchronization between the processors. The POP-MBC algorithm is highly memory efficient since it does not store the previously computed patterns in the main memory and requires only the dataset to be stored in the memory. To enhance the load sharing among different nodes, POP-MBC uses a round robin strategy which enables to achieve load balancing as high as 90%. We have also incorporated bit-vectors and numerous optimization techniques exploiting the symmetric property of the graph dataset to reduce the memory consumption and overall running time of the algorithm. Our comp rehensive experimental analyses involving publicly available datasets show that our algorithm distributes the load among the different processors equally and takes less memory, less running time than other maximal biclique mining algorithms.

Highlights

  • The need for maximal biclique mining from graph datasets has been well discussed in the literature [1][4][5][6][8] and has several applications in the field of data mining including social network analysis and protein interaction network analysis [1]

  • Algorithm is that it requires the entire generated maximal biclique subgraphs to be stored in main memory for duplicate detection and runs out of memory for most of the executions with large graph datasets

  • LCM-MBC algorithm is based on the following properties: (i) the number of closed patterns of an symmetric adjacency matrix is even; (ii) for every maximal biclique subgraph, there exists a unique closed pattern pair which corresponds to the vertex sets

Read more

Summary

INTRODUCTION

The need for maximal biclique (complete bipartite subgraph) mining from graph datasets has been well discussed in the literature [1][4][5][6][8] and has several applications in the field of data mining including social network analysis and protein interaction network analysis [1]. The major drawback of MICA algorithm is that it requires the entire generated maximal biclique subgraphs to be stored in main memory for duplicate detection and runs out of memory for most of the executions with large graph datasets. Unlike MICA, the LCM-MBC algorithm does not store the already computed bicliques in memory and the algorithm never runs out of main memory. LCM-MBC algorithm is based on the following properties: (i) the number of closed patterns of an symmetric adjacency matrix is even; (ii) for every maximal biclique subgraph, there exists a unique closed pattern pair which corresponds to the vertex sets. Contributions: In this paper, we propose POPMBC algorithm which enumerates the large maximal bicliques concurrently on several processors without any synchronization.

PRELIMINARIES
POP-MBC PSEUDO CODE
IMPLEMENTATION AND RESULT ANALYSIS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.