Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data

Wei Kong,Xiaohua Hu,Xiaoyang Mou

doi:10.1186/1471-2105-12-s5-s7

Abstract

BackgroundThe wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relationships between genes are hard to identified because they are complex and noisy high-dimensional data and are often hindered by low statistical power. The main challenge now is to extract valuable biological information from the colossal amount of data to gain insight into biological processes and the mechanisms of human disease. To overcome the challenge requires mathematical and computational methods that are versatile enough to capture the underlying biological features and simple enough to be applied efficiently to large datasets.MethodsUnsupervised machine learning approaches provide new and efficient analysis of gene expression profiles. In our study, two unsupervised knowledge-based matrix factorization methods, independent component analysis (ICA) and nonnegative matrix factorization (NMF) are integrated to identify significant genes and related pathways in microarray gene expression dataset of Alzheimer’s disease. The advantage of these two approaches is they can be performed as a biclustering method by which genes and conditions can be clustered simultaneously. Furthermore, they can group genes into different categories for identifying related diagnostic pathways and regulatory networks. The difference between these two method lies in ICA assume statistical independence of the expression modes, while NMF need positivity constrains to generate localized gene expression profiles.ResultsIn our work, we performed FastICA and non-smooth NMF methods on DNA microarray gene expression data of Alzheimer’s disease respectively. The simulation results shows that both of the methods can clearly classify severe AD samples from control samples, and the biological analysis of the identified significant genes and their related pathways demonstrated that these genes play a prominent role in AD and relate the activation patterns to AD phenotypes. It is validated that the combination of these two methods is efficient.ConclusionsUnsupervised matrix factorization methods provide efficient tools to analyze high-throughput microarray dataset. According to the facts that different unsupervised approaches explore correlations in the high-dimensional data space and identify relevant subspace base on different hypotheses, integrating these methods to explore the underlying biological information from microarray dataset is an efficient approach. By combining the significant genes identified by both ICA and NMF, the biological analysis shows great efficient for elucidating the molecular taxonomy of Alzheimer’s disease and enable better experimental design to further identify potential pathways and therapeutic targets of AD.

Highlights

The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes
To evaluate independent component analysis (ICA) and non-smooth NMF (nsNMF) applied to Alzheimer’s disease (AD) DNA gene expression data, we used the data set of hippocampal gene expression of control and AD samples from GEO DataSets offered by Eric M
We excluded the samples with significant noise and chose 8 control and 5 severe AD samples with 6398 genes to test FastICA(http:// www.cis.hut.fi/projects/ica/fastica/) [23] and nsNMF method

Summary

Introduction

The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Researchers have developed various methods for clustering and identifying groups of genes or experimental conditions that exhibit similar expression patterns, such as k-means [1], self-organizing maps (SOM) [2,3] and hierarchical clustering (HC) [4]. These clustering algorithms suffer from two limitations, the one is they group genes (or conditions) based on global similarities in their expression profiles; the other is they only assign each gene to a single cluster. This is difficult to be interpreted by the biologists due to the large number of genes and complex underlying inter-gene dependency

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 27, 2011
Citations: 50	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Exploring matrix factorization techniques for significant genes identification of microarray dataset
Wei Kong ... Xiaoyang Mou
-
Wei Kong, et. al.Wei Kong ... Xiaoyang Mou
01 Dec 2010
01 Dec 2010

Independent component analysis of Alzheimer's DNA microarray gene expression data
Wei Kong ... Zhongxue Chen
Molecular Neurodegeneration | VOL. 4
Wei Kong, et. al.Wei Kong ... Zhongxue Chen
28 Jan 2009
Molecular Neurodegeneration | VOL. 4

Study DNA Microarray Gene Expression Data of Alzheimer's Disease by Independent Component Analysis
Wei Kong ... Xiaoyang Mou
-
Wei Kong, et. al.Wei Kong ... Xiaoyang Mou
01 Jan 2009
01 Jan 2009

Advances in Nonnegative Matrix and Tensor Factorization
A Cichocki ... P Smaragdis
Computational Intelligence and Neuroscience | VOL. 2008
A Cichocki, et. al.A Cichocki ... P Smaragdis
01 Jan 2008
Computational Intelligence and Neuroscience | VOL. 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics