Identification of Pan-Cancer Biomarkers Based on the Gene Expression Profiles of Cancer Cell Lines.

Shijian Ding,Tao Huang,Hao Li,Yu-Dong Cai,Kaiyan Feng,Zhandong Li,Lei Chen,Yu-Hang Zhang,Xianchao Zhou

doi:10.3389/fcell.2021.781285

Abstract

There are many types of cancers. Although they share some hallmarks, such as proliferation and metastasis, they are still very different from many perspectives. They grow on different organ or tissues. Does each cancer have a unique gene expression pattern that makes it different from other cancer types? After the Cancer Genome Atlas (TCGA) project, there are more and more pan-cancer studies. Researchers want to get robust gene expression signature from pan-cancer patients. But there is large variance in cancer patients due to heterogeneity. To get robust results, the sample size will be too large to recruit. In this study, we tried another approach to get robust pan-cancer biomarkers by using the cell line data to reduce the variance. We applied several advanced computational methods to analyze the Cancer Cell Line Encyclopedia (CCLE) gene expression profiles which included 988 cell lines from 20 cancer types. Two feature selection methods, including Boruta, and max-relevance and min-redundancy methods, were applied to the cell line gene expression data one by one, generating a feature list. Such list was fed into incremental feature selection method, incorporating one classification algorithm, to extract biomarkers, construct optimal classifiers and decision rules. The optimal classifiers provided good performance, which can be useful tools to identify cell lines from different cancer types, whereas the biomarkers (e.g. NCKAP1, TNFRSF12A, LAMB2, FKBP9, PFN2, TOM1L1) and rules identified in this work may provide a meaningful and precise reference for differentiating multiple types of cancer and contribute to the personalized treatment of tumors.

Highlights

To get robust pan-biomarkers, there are two approaches: increase the sample size or reduce the variance
A total of 54,634 features were removed, and 3,186 features were retained. These retained features are provided in Supplementary Table S1. These 3,186 features were further analyzed by using the max-relevance and min-redundancy (mRMR) method, and a feature ranking list was generated on the basis of their importance
The feature list produced by the mRMR method was fed into the incremental feature selection (IFS) method

Summary

Introduction

To get robust pan-biomarkers, there are two approaches: increase the sample size or reduce the variance. The important genes were extracted by using the Boruta method (Kursa and Rudnicki, 2010) These genes were further analyzed with the max-relevance and min-redundancy (mRMR) method to evaluate their importance and sort them in a feature list. This list was fed into the incremental feature selection (IFS) method (Liu and Setiono, 1998) that combined support vector machine (SVM) (Cortes and Vapnik, 1995) or decision tree (DT) (Safavian and Landgrebe, 1991) to identify important genes and decision rules and build powerful classifiers. This study gives new insight into pan-cancer studies and may provide novel targets of tumorspecific therapies

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in cell and developmental biology	Publication Date: Nov 30, 2021
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Identification of Pan-Cancer Biomarkers Based on the Gene Expression Profiles of Cancer Cell Lines.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in cell and developmental biology

Lead the way for us

Similar Papers

Table2.XLSX
-
-
--
01 Dec 2021
01 Dec 2021

Table3.XLSX
-
-
--
01 Dec 2021
01 Dec 2021

Table4.XLSX
-
-
--
01 Dec 2021
01 Dec 2021

Table1.XLSX
-
-
--
01 Dec 2021
01 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identification of Pan-Cancer Biomarkers Based on the Gene Expression Profiles of Cancer Cell Lines.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in cell and developmental biology