Transcription network construction for large-scale microarray datasets using a high-performance computing approach

Mengxia (Michelle) Zhu,Qishi Wu

doi:10.1186/1471-2164-9-s1-s5

Mengxia (Michelle) Zhu, Qishi Wu

Open Access

https://doi.org/10.1186/1471-2164-9-s1-s5

Copy DOI

Abstract

BackgroundThe advance in high-throughput genomic technologies including microarrays has demonstrated the potential of generating a tremendous amount of gene expression data for the entire genome. Deciphering transcriptional networks that convey information on intracluster correlations and intercluster connections of genes is a crucial analysis task in the post-sequence era. Most of the existing analysis methods for genome-wide gene expression profiles consist of several steps that often require human involvement based on experiential knowledge that is generally difficult to acquire and formalize. Moreover, large-scale datasets typically incur prohibitively expensive computation overhead and thus result in a long experiment-analysis research cycle.ResultsWe propose a parallel computation-based random matrix theory approach to analyze the cross correlations of gene expression data in an entirely automatic and objective manner to eliminate the ambiguities and subjectivity inherent to human decisions. We apply the proposed approach to the publicly available human liver cancer data and yeast cycle data, and generate transcriptional networks that illustrate interacting functional modules. The experimental results conform accurately to those published in previous literatures.ConclusionsThe correlations calculated from experimental measurements typically contain both “genuine” and “random” components. In the proposed approach, we remove the “random” component by testing the statistics of the eigenvalues of the correlation matrix against a “null hypothesis” — a truly random correlation matrix obtained from mutually uncorrelated expression data series. Our investigation into the components of deviating eigenvectors after varimax orthogonal rotation reveals distinct functional modules. The utilization of high performance computing resources including ScaLAPACK package, supercomputer and Linux PC cluster in our implementations and experiments significantly reduces the amount of computation time that is otherwise needed on a single workstation. More importantly, the large distributed shared memory and parallel computing power allow us to process genomic datasets of enormous sizes.

Highlights

The advance in high-throughput genomic technologies including microarrays has demonstrated the potential of generating a tremendous amount of gene expression data for the entire genome
We propose to develop a system that constructs and analyzes various aspects of transcriptional networks based on random matrix theory (RMT) [13,14] using ScaLAPACK [15,16] for parallel calculation of linear algebra routines
Results from RMT method The entire yeast genome is partitioned into a large number of functional modules sharing similar expression patterns

Summary

Introduction

The advance in high-throughput genomic technologies including microarrays has demonstrated the potential of generating a tremendous amount of gene expression data for the entire genome. A single gene is usually extracted to differentially identify expression genes at a significant level. Such point level analysis does not address the full potential of genomescale experiments. Genes ascribed to the same cluster are usually responsible for a specific physiological process or belong to the same molecular complex Such transcriptome (mRNAs) datasets deliver new knowledge and provide a revealing insight to the existing genome (genes) datasets, and can be used to guide proteome (proteins) and interactome research that aims to extract key biological features such as protein-protein interactions and subcellular localizations more accurately and efficiently

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jan 1, 2008
Citations: 28	License type: cc-by

R Discovery Prime

R Discovery Prime

Transcription network construction for large-scale microarray datasets using a high-performance computing approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

A New Approach to Identify Functional Modules Using Random Matrix Theory
Mengxia Zhu ... Qishi Wu
-
Mengxia Zhu, et. al.Mengxia Zhu ... Qishi Wu
01 Sep 2006
01 Sep 2006

Cost-oriented proactive fault tolerance approach to high performance computing (HPC) in the cloud
Ifeanyi P Egwutuoha ... Rafael Calvo
International Journal of Parallel, Emergent and Distributed Systems | VOL. 29
Ifeanyi P Egwutuoha, et. al.Ifeanyi P Egwutuoha ... Rafael Calvo
22 Jan 2014
International Journal of Parallel, Emergent and Distributed Systems | VOL. 29

Benchmarking Joyent Smartdatacenter for Hadoop Mapreduce and Mpi Operations
Weiliang Luo ... Anthony Chronopoulos
-
Weiliang Luo, et. al.Weiliang Luo ... Anthony Chronopoulos
01 Oct 2013
01 Oct 2013

Random matrix approach to cross correlations in financial data.
Vasiliki Plerou ... Luís A Nunes Amaral
Physical Review E | VOL. 65
Vasiliki Plerou, et. al.Vasiliki Plerou ... Luís A Nunes Amaral
27 Jun 2002
Physical Review E | VOL. 65

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transcription network construction for large-scale microarray datasets using a high-performance computing approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics