Abstract

BackgroundDespite anecdotal reports of differences in clinical and demographic characteristics of The Cancer Genome Atlas (TCGA) relative to general population cancer cases, differences have not been systematically evaluated.MethodsData from 11,160 cases with 33 cancer types were ascertained from TCGA data portal. Corresponding data from the Surveillance, Epidemiology, and End Results (SEER) 18 and North American Association of Central Cancer Registries databases were obtained. Differences in characteristics were compared using Student’s t, Chi-square, and Fisher’s exact tests. Differences in mean survival months were assessed using restricted mean survival time analysis and generalised linear model.ResultsTCGA cases were 3.9 years (95% CI 1.7–6.2) younger on average than SEER cases, with a significantly younger mean age for 20/33 cancer types. Although most cancer types had a similar sex distribution, race and stage at diagnosis distributions were disproportional for 13/18 and 25/26 assessed cancer types, respectively. Using 12 months as an end point, the observed mean survival months were longer for 27 of 33 TCGA cancer types.ConclusionsDifferences exist in the characteristics of TCGA vs. general population cancer cases. Our study highlights population subgroups where increased sample collection is warranted to increase the applicability of cancer genomic research results to all individuals.

Highlights

  • In recent years, progress in genome sequencing technologies and bioinformatics has provided enormous gains in understanding of the molecular aberrations associated with the development of various cancers

  • We extend the results from previous studies by comparing demographic and clinical characteristics between The Cancer Genome Atlas (TCGA) cases with 33 cancer types and cases in two population-based databases: (1) the SEER 18 database that currently covers ~28% of the U.S population,[17] and (2) the U.S combined registries of North American Association of Central

  • Of 11,160 TCGA cases with 33 cancer types diagnosed between 1978 and 2013, 1097 cases were diagnosed with breast invasive carcinoma (BRCA) followed by glioblastoma multiforme (GBM, n = 596), ovarian serous cystadenocarcinoma (OV, n = 587), uterine corpus endometrial carcinoma (UCEC, n = 548), kidney renal clear cell carcinoma (KIRC, n = 537), head and neck squamous cell carcinoma (HNSC, n = 528), lung adenocarcinoma (LUAD, n = 522), and brain lower grade glioma (LGG, n = 515)

Read more

Summary

Introduction

Progress in genome sequencing technologies and bioinformatics has provided enormous gains in understanding of the molecular aberrations associated with the development of various cancers. The emergence of publicly available cancer genomic datasets, including The Cancer Genome Atlas (TCGA), facilitates the comprehensive understanding of the molecular pathogenesis of cancer and is allowing for the development of new strategies to improve cancer diagnosis, therapy, and prevention. By analysing these publicly available genomic data, many novel disease-associated genes have been uncovered.[1,2]. More than 11,000 individuals with 33 cancer types have been included in the cohort.[3,4] These data have far contributed to >2000 studies of various cancers in PubMed. Despite anecdotal reports of differences in clinical and demographic characteristics of The Cancer Genome Atlas (TCGA) relative to general population cancer cases, differences have not been systematically evaluated. Our study highlights population subgroups where increased sample collection is warranted to increase the applicability of cancer genomic research results to all individuals

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call