Abstract

In the context of cancer, internal “checkerboard” structures are normally found in the matrices of gene expression data, which correspond to genes that are significantly up- or down-regulated in patients with specific types of tumors. In this paper, we propose a novel method, called dual graph-regularization principal component analysis (DGPCA). The main innovation of this method is that it simultaneously considers the internal geometric structures of the condition manifold and the gene manifold. Specifically, we obtain principal components (PCs) to represent the data and approximate the cluster membership indicators through Laplacian embedding. This new method is endowed with internal geometric structures, such as the condition manifold and gene manifold, which are both suitable for bi-clustering. A closed-form solution is provided for DGPCA. We apply this new method to simultaneously cluster genes and conditions (e.g., different samples) with the aim of finding internal “checkerboard” structures on gene expression data, if they exist. Then, we use this new method to identify regulatory genes under the particular conditions and to compare the results with those of other state-of-the-art PCA-based methods. Promising results on gene expression data have been verified by extensive experiments.

Highlights

  • 1.1 Biological analysis of principal component analysis (PCA)With the development of molecular biology, the gene chip has become one of the most important technologies of gene functional annotation in the post-genomic era [1]

  • On the left is the checkerboard structure of the leukemia data, where each column corresponds to a sample; in the center are the principal directions; and on the right are the projected samples in the new subspace

  • In heat map (a), the arrangement of the 38 samples is generally based on the three types of labels: acute myelogenous leukemia (AML), T- and B-cells

Read more

Summary

Biological analysis of PCA

With the development of molecular biology, the gene chip has become one of the most important technologies of gene functional annotation in the post-genomic era [1]. Without losing the original data, principal component analysis (PCA) transforms the data to a low-dimensional linear or nearly linear subspace constituted by principal components (PCs) [3]. This method overcomes the limitations of bioinformatics methods in gene chip analysis and provides new. The selected information simplifies the complexity of the gene chip variable and clusters the obtained data. This method provides the basis for early diagnosis and subtyping of cancer

Checkerboard structures in gene expression data and relations with PCA
We present a closed-form solution for this problem
RELATED WORK
Construct sample and gene graph
Objective function of dual graph-regularization PCA
A 1XVVT
Datasets
EXPERIMENTS
Experimental setting
Bi-clustering results to find “checkerboard” structure
RESULTS
Analysis of matching results
Finding regulatory genes under the particular
Visualization of overlapping results
Comparison with published results
Function analysis of unique regulatory genes
Gene interaction of biological pathway analysis
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call