Abstract
Recently, cancer has been characterized as a heterogeneous disease composed of many different subtypes. Early diagnosis of cancer subtypes is an important study of cancer research, which can be of tremendous help to patients after treatment. In this paper, we first extract a novel dataset, which contains gene expression, miRNA expression, and isoform expression of five cancers from The Cancer Genome Atlas (TCGA). Next, to avoid the effect of noise existing in 60, 483 genes, we select a small number of genes by using LASSO that employs gene expression and survival time of patients. Then, we construct one similarity kernel for each expression data by using Chebyshev distance. And also, We used SKF to fused the three similarity matrix composed of gene, Iso, and miRNA, and finally clustered the fused similarity matrix with spectral clustering. In the experimental results, our method has better P-value in the Cox model than other methods on 10 cancer data from Jiang Dataset and Novel Dataset. We have drawn different survival curves for different cancers and found that some genes play a key role in cancer. For breast cancer, we find out that HSPA2A, RNASE1, CLIC6, and IFITM1 are highly expressed in some specific groups. For lung cancer, we ensure that C4BPA, SESN3, and IRS1 are highly expressed in some specific groups. The code and all supporting data files are available from https://github.com/guofei-tju/Uncovering-Cancer-Subtypes-via-LASSO.
Highlights
Numerous studies have shown that cancer is a heterogeneous disease (Wang et al, 2005)
We analyze the performance of our method on the dataset in several ways
We introduce an evaluation criteria and a verification method that are used to evaluate the significant performance of cancer subtypes prediction
Summary
Numerous studies have shown that cancer is a heterogeneous disease (Wang et al, 2005). It is very meaningful to be able to accurately identify cancer subtypes, including molecular subtyping as well as clinical outcome-based clustering. With the development of whole-genome sequencing techniques in recent years, the diagnosis and treatments have gained great development (Wang K. et al, 2014; Haase et al, 2015). We have obtained massive cancer expression from database as The Cancer Genome Atlas (TCGA) (Tomczak et al, 2015). These expression data have positive influence on the development of the cancer subtype identification tools (Sohn et al, 2017; Guo Y. et al, 2018)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.