Abstract

Non-negative matrix factorization (NMF) is a matrix decomposition method based on the square loss function. To exploit cancer information, cancer gene expression data often uses the NMF method to reduce dimensionality. Gene expression data usually have some noise and outliers, while the original NMF loss function is very sensitive to non-Gaussian noise. To improve the robustness and clustering performance of the algorithm, we propose a sparse graph regularization NMF based on Huber loss model for cancer data analysis (Huber-SGNMF). Huber loss is a function between L 1-norm and L 2-norm that can effectively handle non-Gaussian noise and outliers. Taking into account the sparsity matrix and data geometry information, sparse penalty and graph regularization terms are introduced into the model to enhance matrix sparsity and capture data manifold structure. Before the experiment, we first analyzed the robustness of Huber-SGNMF and other models. Experiments on The Cancer Genome Atlas (TCGA) data have shown that Huber-SGNMF performs better than other most advanced methods in sample clustering and differentially expressed gene selection.

Highlights

  • Cancer is considered to be the number one killer of human health

  • We propose a model called sparse graph regularization negative matrix factorization (NMF) based on Huber Loss Model for Cancer Data Analysis (Huber-SGNMF)

  • The detesteds PAAD, head and neck squamous cell carcinoma (HNSC), and colon adenocarcinoma (COAD) are integrated into one dataset, which is represented as PHD

Read more

Summary

Introduction

The development of highthroughput sequencing technology has enabled researchers to obtain more comprehensive information about cancer patients (Chen et al, 2019). The gene expression data of cancer patients can be more used for effective data mining through computational methods (Chen et al, 2018). Cancer gene expression data are characterized by high dimensionality, which is extremely difficult for data analysis. How to effectively reduce the dimensionality of data is the key to analyzing cancer data. Principal component analysis (PCA) (Feng et al, 2019), locally linear embedding (LLE) (Roweis and Saul, 2000), and non-negative matrix factorization (NMF) (Yu et al, 2017) are common methods for reducing the data dimensionality. NMF can find two non-negative matrices and its product can effectively restore the original matrix. NMF demonstrates its advantages in facial recognition, speech processing, document clustering, and recommendation systems (Guillamet and Vitrià, 2002; Xu et al, 2003; Schmidt and Olsson, 2006; Luo et al, 2014)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.