Dual Graph-Laplacian PCA: A Closed-Form Solution for Bi-Clustering to Find “Checkerboard” Structures on Gene Expression Data

Jin-Xing Liu,Yong Xu,Chun-Mei Feng,Xiang-Zhen Kong

doi:10.1109/access.2019.2941227

Jin-Xing Liu, Yong Xu + Show 2 more

Open Access

https://doi.org/10.1109/access.2019.2941227

Copy DOI

Abstract

In the context of cancer, internal “checkerboard” structures are normally found in the matrices of gene expression data, which correspond to genes that are significantly up- or down-regulated in patients with specific types of tumors. In this paper, we propose a novel method, called dual graph-regularization principal component analysis (DGPCA). The main innovation of this method is that it simultaneously considers the internal geometric structures of the condition manifold and the gene manifold. Specifically, we obtain principal components (PCs) to represent the data and approximate the cluster membership indicators through Laplacian embedding. This new method is endowed with internal geometric structures, such as the condition manifold and gene manifold, which are both suitable for bi-clustering. A closed-form solution is provided for DGPCA. We apply this new method to simultaneously cluster genes and conditions (e.g., different samples) with the aim of finding internal “checkerboard” structures on gene expression data, if they exist. Then, we use this new method to identify regulatory genes under the particular conditions and to compare the results with those of other state-of-the-art PCA-based methods. Promising results on gene expression data have been verified by extensive experiments.

Highlights

1.1 Biological analysis of principal component analysis (PCA)With the development of molecular biology, the gene chip has become one of the most important technologies of gene functional annotation in the post-genomic era [1]
On the left is the checkerboard structure of the leukemia data, where each column corresponds to a sample; in the center are the principal directions; and on the right are the projected samples in the new subspace
In heat map (a), the arrangement of the 38 samples is generally based on the three types of labels: acute myelogenous leukemia (AML), T- and B-cells

Summary

Biological analysis of PCA

With the development of molecular biology, the gene chip has become one of the most important technologies of gene functional annotation in the post-genomic era [1]. Without losing the original data, principal component analysis (PCA) transforms the data to a low-dimensional linear or nearly linear subspace constituted by principal components (PCs) [3]. This method overcomes the limitations of bioinformatics methods in gene chip analysis and provides new. The selected information simplifies the complexity of the gene chip variable and clusters the obtained data. This method provides the basis for early diagnosis and subtyping of cancer

Checkerboard structures in gene expression data and relations with PCA

We present a closed-form solution for this problem

RELATED WORK

Construct sample and gene graph

Objective function of dual graph-regularization PCA

A 1XVVT

Datasets

EXPERIMENTS

Experimental setting

Bi-clustering results to find “checkerboard” structure

RESULTS

Analysis of matching results

Finding regulatory genes under the particular

Visualization of overlapping results

Comparison with published results

Function analysis of unique regulatory genes

Gene interaction of biological pathway analysis

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dual Graph-Laplacian PCA: A Closed-Form Solution for Bi-Clustering to Find “Checkerboard” Structures on Gene Expression Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
Kang Ning ... Alexey I Nesvizhskii
Journal of Proteome Research | VOL. 11
Kang Ning, et. al.Kang Ning ... Alexey I Nesvizhskii
29 Feb 2012
Journal of Proteome Research | VOL. 11

A multivariate analysis approach to the integration of proteomic and gene expression data
Ailís Fagan ... Aedín C Culhane
PROTEOMICS | VOL. 7
Ailís Fagan, et. al.Ailís Fagan ... Aedín C Culhane
01 Jun 2007
PROTEOMICS | VOL. 7

Integration of Clinical and Gene Expression Data Has a Synergetic Effect on Predicting Breast Cancer Outcome
Martin H Van Vliet ... Hugo M Horlings
PLoS ONE | VOL. 7
Martin H Van Vliet, et. al.Martin H Van Vliet ... Hugo M Horlings
11 Jul 2012
PLoS ONE | VOL. 7

Platelet-derived Growth Factor Stimulates Src-dependent mRNA Stabilization of Specific Early Genes in Fibroblasts
Paul A Bromann ... Sara A Courtneidge
Journal of Biological Chemistry | VOL. 280
Paul A Bromann, et. al.Paul A Bromann ... Sara A Courtneidge
01 Mar 2005
Journal of Biological Chemistry | VOL. 280

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dual Graph-Laplacian PCA: A Closed-Form Solution for Bi-Clustering to Find “Checkerboard” Structures on Gene Expression Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access