Abstract

Recently, cancer has been characterized as a heterogeneous disease composed of many different subtypes. Early diagnosis of cancer subtypes is an important study of cancer research, which can be of tremendous help to patients after treatment. In this paper, we first extract a novel dataset, which contains gene expression, miRNA expression, and isoform expression of five cancers from The Cancer Genome Atlas (TCGA). Next, to avoid the effect of noise existing in 60, 483 genes, we select a small number of genes by using LASSO that employs gene expression and survival time of patients. Then, we construct one similarity kernel for each expression data by using Chebyshev distance. And also, We used SKF to fused the three similarity matrix composed of gene, Iso, and miRNA, and finally clustered the fused similarity matrix with spectral clustering. In the experimental results, our method has better P-value in the Cox model than other methods on 10 cancer data from Jiang Dataset and Novel Dataset. We have drawn different survival curves for different cancers and found that some genes play a key role in cancer. For breast cancer, we find out that HSPA2A, RNASE1, CLIC6, and IFITM1 are highly expressed in some specific groups. For lung cancer, we ensure that C4BPA, SESN3, and IRS1 are highly expressed in some specific groups. The code and all supporting data files are available from https://github.com/guofei-tju/Uncovering-Cancer-Subtypes-via-LASSO.

Highlights

  • Numerous studies have shown that cancer is a heterogeneous disease (Wang et al, 2005)

  • We analyze the performance of our method on the dataset in several ways

  • We introduce an evaluation criteria and a verification method that are used to evaluate the significant performance of cancer subtypes prediction

Read more

Summary

Introduction

Numerous studies have shown that cancer is a heterogeneous disease (Wang et al, 2005). It is very meaningful to be able to accurately identify cancer subtypes, including molecular subtyping as well as clinical outcome-based clustering. With the development of whole-genome sequencing techniques in recent years, the diagnosis and treatments have gained great development (Wang K. et al, 2014; Haase et al, 2015). We have obtained massive cancer expression from database as The Cancer Genome Atlas (TCGA) (Tomczak et al, 2015). These expression data have positive influence on the development of the cancer subtype identification tools (Sohn et al, 2017; Guo Y. et al, 2018)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.