A GPU-accelerated algorithm for biclustering analysis and detection of condition-dependent coexpression network modules

Anindya Bhattacharya,Yan Cui

doi:10.1038/s41598-017-04070-4

Abstract

In the analysis of large-scale gene expression data, it is important to identify groups of genes with common expression patterns under certain conditions. Many biclustering algorithms have been developed to address this problem. However, comprehensive discovery of functionally coherent biclusters from large datasets remains a challenging problem. Here we propose a GPU-accelerated biclustering algorithm, based on searching for the largest Condition-dependent Correlation Subgroups (CCS) for each gene in the gene expression dataset. We compared CCS with thirteen widely used biclustering algorithms. CCS consistently outperformed all the thirteen biclustering algorithms on both synthetic and real gene expression datasets. As a correlation-based biclustering method, CCS can also be used to find condition-dependent coexpression network modules. We implemented the CCS algorithm using C and implemented the parallelized CCS algorithm using CUDA C for GPU computing. The source code of CCS is available from https://github.com/abhatta3/Condition-dependent-Correlation-Subgroups-CCS.

Highlights

Clustering algorithms have been widely used to group genes based on their similarities in expression[1,2,3,4]
Benefits of Pearson Correlation Coefficient based similarity measures over the conventional mean square residue based bicluster scores again was demonstrated by Bi-Correlation Clustering algorithm (BCCA) that looks for positively correlated biclusters and reports biclusters for each pair of genes present in a dataset[13]
The performance of Condition-dependent Correlation Subgroup (CCS) was compared with Correlated Pattern Biclusters (CPB), BCCA, BICLIC and ten other widely-used biclustering algorithms on 5 synthetic and 5 real gene expression datasets

Summary

Introduction

Clustering algorithms have been widely used to group genes based on their similarities in expression[1,2,3,4]. Clustering algorithms that obtain grouping based on similarities over all the samples in a dataset are not effective for detecting condition-dependent coexpression patterns. Conventional way of finding biclusters depends on the selection of random seeds of genes and/or samples followed by their augmentation based on a scoring function. Expressed with similar or opposite patterns of expression but the expression values are very different Such modules are important as they may represent relations between genes in the same biological functions[1, 13]. We propose a more effective correlation-based biclustering algorithm named Condition-dependent Correlation Subgroup (CCS) It integrates several important features for developing an effective algorithm for comprehensive discovery of functionally coherent biclusters[1]. The performance of CCS was compared with CPB, BCCA, BICLIC and ten other widely-used biclustering algorithms on 5 synthetic and 5 real gene expression datasets. We showed that there is equivalence between the CCS biclusters and condition-dependent coexpression network modules

Methods

Results

Conclusion