Abstract
There are many types of cancers. Although they share some hallmarks, such as proliferation and metastasis, they are still very different from many perspectives. They grow on different organ or tissues. Does each cancer have a unique gene expression pattern that makes it different from other cancer types? After the Cancer Genome Atlas (TCGA) project, there are more and more pan-cancer studies. Researchers want to get robust gene expression signature from pan-cancer patients. But there is large variance in cancer patients due to heterogeneity. To get robust results, the sample size will be too large to recruit. In this study, we tried another approach to get robust pan-cancer biomarkers by using the cell line data to reduce the variance. We applied several advanced computational methods to analyze the Cancer Cell Line Encyclopedia (CCLE) gene expression profiles which included 988 cell lines from 20 cancer types. Two feature selection methods, including Boruta, and max-relevance and min-redundancy methods, were applied to the cell line gene expression data one by one, generating a feature list. Such list was fed into incremental feature selection method, incorporating one classification algorithm, to extract biomarkers, construct optimal classifiers and decision rules. The optimal classifiers provided good performance, which can be useful tools to identify cell lines from different cancer types, whereas the biomarkers (e.g. NCKAP1, TNFRSF12A, LAMB2, FKBP9, PFN2, TOM1L1) and rules identified in this work may provide a meaningful and precise reference for differentiating multiple types of cancer and contribute to the personalized treatment of tumors.
Highlights
To get robust pan-biomarkers, there are two approaches: increase the sample size or reduce the variance
A total of 54,634 features were removed, and 3,186 features were retained. These retained features are provided in Supplementary Table S1. These 3,186 features were further analyzed by using the max-relevance and min-redundancy (mRMR) method, and a feature ranking list was generated on the basis of their importance
The feature list produced by the mRMR method was fed into the incremental feature selection (IFS) method
Summary
To get robust pan-biomarkers, there are two approaches: increase the sample size or reduce the variance. The important genes were extracted by using the Boruta method (Kursa and Rudnicki, 2010) These genes were further analyzed with the max-relevance and min-redundancy (mRMR) method to evaluate their importance and sort them in a feature list. This list was fed into the incremental feature selection (IFS) method (Liu and Setiono, 1998) that combined support vector machine (SVM) (Cortes and Vapnik, 1995) or decision tree (DT) (Safavian and Landgrebe, 1991) to identify important genes and decision rules and build powerful classifiers. This study gives new insight into pan-cancer studies and may provide novel targets of tumorspecific therapies
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.