A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.

Aaditya V Rangan,Preeti Raghavan,Nicholas Schork,Vicky Yao,Anders Jureus,Caroline C Mcgrouther,Mikael Landen,Olga Troyanskaya,Seda Bilaloglu,Qian Zhu,John Kelsoe,Sarah Bergen,Arjun Krishnan,Eli Stahl

doi:10.1371/journal.pcbi.1006105

Abstract

A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).

Highlights

Many applications in data-analysis involve some form of ‘biclustering’— referred to as coclustering, two-mode clustering, two-way clustering, block clustering, and coupled two-way clustering, to name a few
An important problem in genomics is how to detect the genetic signatures associated with disease
In this paper we present a new biclustering method which can scale up efficiently to handle large genomic data sets, such as GWAS-data

Summary

Introduction

Many applications in data-analysis involve some form of ‘biclustering’— referred to as coclustering, two-mode clustering, two-way clustering, block clustering, and coupled two-way clustering, to name a few (see, e.g., [1,2,3,4,5]). The goal of biclustering is to search through a large data-array and reveal components that have special structure. These structured components involve only a subset of the rows and columns in the data-array, and finding them can be rather difficult (i.e., biclustering is NP-complete [6]). Because this problem is so general, it should come as no surprise that there are many different kinds of biclustering algorithms developed for a variety of applications, ranging from political science to neuroscience [7, 8]. We demonstrate the efficacy of our loop-counting method by applying it to a gene-expression data-set and a GWAS data-set, using gene-enrichment analysis as a form of validation

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Computational Biology	Publication Date: May 14, 2018
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Identification of disease-associated pathways in pancreatic cancer by integrating genome-wide association study and gene expression data.
JIN LONG ... XINGDA WU
Oncology letters | VOL. 12
JIN LONG, et. al.JIN LONG ... XINGDA WU
26 May 2016
Oncology letters | VOL. 12

Accounting for nonlinear effects of gene expression identifies additional associated genes in transcriptome-wide association studies.
Zhaotong Lin ... Katherine A Knutson
Human Molecular Genetics | VOL. 31
Zhaotong Lin, et. al.Zhaotong Lin ... Katherine A Knutson
19 Jan 2022
Human Molecular Genetics | VOL. 31

The Shared Mechanism and Candidate Drugs of Multiple Sclerosis and Sjögren's Syndrome Analyzed by Bioinformatics Based on GWAS and Transcriptome Data.
Xiangxiang Hong ... Tingting Zhao
Frontiers in Immunology | VOL. 13
Xiangxiang Hong, et. al.Xiangxiang Hong ... Tingting Zhao
09 Mar 2022
Frontiers in Immunology | VOL. 13

Differential gene expression in dairy cows under negative energy balance and ketosis: A systematic review and meta-analysis
R.A.N Soares ... E.J Squires
Journal of Dairy Science | VOL. 104
R.A.N Soares, et. al.R.A.N Soares ... E.J Squires
12 Nov 2020
Journal of Dairy Science | VOL. 104

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology