Abstract

Background Cancer of unknown primary (CUP) is a type of malignant tumor, which is histologically diagnosed as a metastatic carcinoma while the tissue-of-origin cannot be identified. CUP accounts for roughly 5% of all cancers. Traditional treatment for CUP is primarily broad-spectrum chemotherapy; however, the prognosis is relatively poor. Thus, it is of clinical importance to accurately infer the tissue-of-origin of CUP. Methods We developed a gradient boosting framework to trace tissue-of-origin of 20 types of solid tumors. Specifically, we downloaded the expression profiles of 20,501 genes for 7713 samples from The Cancer Genome Atlas (TCGA), which were used as the training data set. The RNA-seq data of 79 tumor samples from 6 cancer types with known origins were also downloaded from the Gene Expression Omnibus (GEO) for an independent data set. Results 400 genes were selected to train a gradient boosting model for identification of the primary site of the tumor. The overall 10-fold cross-validation accuracy of our method was 96.1% across 20 types of cancer, while the accuracy for the independent data set reached 83.5%. Conclusion Our gradient boosting framework was proven to be accurate in identifying tumor tissue-of-origin on both training data and independent testing data, which might be of practical usage.

Highlights

  • Cancer of unknown primary (CUP) is a type of malignant tumor, histologically diagnosed as a metastatic carcinoma with no confidently anatomical primary site even after comprehensive evaluation

  • CUP accounts for approximately 3% to 5% of all tumors [1,2,3,4]

  • The matrix was generated by normalizing the expression value of each sample and each gene from The Cancer Genome Atlas (TCGA)

Read more

Summary

Introduction

Cancer of unknown primary (CUP) is a type of malignant tumor, histologically diagnosed as a metastatic carcinoma with no confidently anatomical primary site even after comprehensive evaluation. Cancer of unknown primary (CUP) is a type of malignant tumor, which is histologically diagnosed as a metastatic carcinoma while the tissue-of-origin cannot be identified. We developed a gradient boosting framework to trace tissue-of-origin of 20 types of solid tumors. The RNA-seq data of 79 tumor samples from 6 cancer types with known origins were downloaded from the Gene Expression Omnibus (GEO) for an independent data set. Our gradient boosting framework was proven to be accurate in identifying tumor tissue-of-origin on both training data and independent testing data, which might be of practical usage

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.