PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data

Chun-Mei Feng,Jun-Liang Shang,Ling-Yun Dai,Mi-Xiao Hou,Yong Xu

doi:10.1186/s12859-019-3229-z

Chun-Mei Feng, Jun-Liang Shang + Show 3 more

Open Access

https://doi.org/10.1186/s12859-019-3229-z

Copy DOI

Abstract

BackgroundIn recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in data, e.g., pair-wise distance between data points, have not been explored. This information can facilitate sample clustering; second, the Principal Components (PCs) determined by PCA are dense, leading to hard interpretation. However, only a few of genes are related to the cancer. It is of great significance for the early diagnosis and treatment of cancer to identify a handful of the differentially expressed genes and find new cancer biomarkers.ResultsIn this study, a new method gLSPCA is proposed to integrate both graph Laplacian and sparse constraint into PCA. gLSPCA on the one hand improves the clustering accuracy by exploring the internal geometric structure of the data, on the other hand identifies differentially expressed genes by imposing a sparsity constraint on the PCs.ConclusionsExperiments of gLSPCA and its comparison with existing methods, including Z-SPCA, GPower, PathSPCA, SPCArt, gLPCA, are performed on real datasets of both pancreatic cancer (PAAD) and head & neck squamous carcinoma (HNSC). The results demonstrate that gLSPCA is effective in identifying differentially expressed genes and sample clustering. In addition, the applications of gLSPCA on these datasets provide several new clues for the exploration of causative factors of PAAD and HNSC.

Highlights

In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics
The contributions of this paper can be enumerated as follows: (i) We proposed a novel method called graph Laplacian and sparse constraint (gLSPCA) which simultaneously learns the internal geometric structure and improves the interpretability of Principal Components (PCs). gLSPCA on the one hand can identify differentially expressed genes, on the other hand can be applied for sample clustering
Extensive experiments for differentially expressed genes identification and sample clustering are conducted in the section of Results and Discussion, where related sparse Principal Component Analysis (PCA) methods are compared with our method

Summary

Results

In this study, a new method gLSPCA is proposed to integrate both graph Laplacian and sparse constraint into PCA. gLSPCA on the one hand improves the clustering accuracy by exploring the internal geometric structure of the data, on the other hand identifies differentially expressed genes by imposing a sparsity constraint on the PCs.

Conclusions

Background

Methodology

Results and discussion

Methods

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2019
Citations: 8	License type: open-access

R Discovery Prime

R Discovery Prime

PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Abstract 3718: NSD histone methyltransferases drive cell proliferation in HPV-negative head and neck squamous cell carcinoma (HNSCC)
Iuliia Topchu ... Jochen Lorch
Cancer Research | VOL. 82
Iuliia Topchu, et. al.Iuliia Topchu ... Jochen Lorch
15 Jun 2022
Cancer Research | VOL. 82

Sparse gene expression data analysis based on truncated power
Ningmin Shen ... Peiyun Zhou
-
Ningmin Shen, et. al.Ningmin Shen ... Peiyun Zhou
01 Nov 2014
01 Nov 2014

Identifying splits with clear separation: a new class discovery method for gene expression data.
Anja Von Heydebreck ... Annemarie Poustka
Bioinformatics (Oxford, England) | VOL. Suppl 17 1
Anja Von Heydebreck, et. al.Anja Von Heydebreck ... Annemarie Poustka
01 Jun 2001
Bioinformatics (Oxford, England) | VOL. Suppl 17 1

Author response: Sparse dimensionality reduction approaches in Mendelian randomisation with highly correlated exposures
Vasileios Karageorgiou ... Dipender Gill
-
Vasileios Karageorgiou, et. al.Vasileios Karageorgiou ... Dipender Gill
28 Nov 2022
28 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics