Abstract

6588 Background: Large publicly available databases are important repositories for analyses of clinicogenomic research used for identifying clinically relevant biomarkers. Diversity among individuals in these repositories is key for ensuring applicability of findings to patient populations. Methods: We compared two publicly available pancancer databases from academic institutions: The Cancer Genome Atlas (TCGA) and United States (US) institutions from the American Association for Cancer Research (AACR) Project GENIE version 11.0 (APG) with cancer incidence statistics from The US Cancer Statistics (USCS) in 2018, the most recently available data. We compared demographic data from key individual cancer types (Lung, Colorectal, Prostate, Breast, Gliomas, and Leukemias) for gender, race, and ethnicity. Frequencies are displayed as percentages and compared by Chi-Squared method. Results: The USCS includes 1,708,921 new cases in 2018 while the TCGA includes 12,958 cases and APG includes 109,041 cases. Women account for 49.5% of all cancer diagnosis and similarly 50.4% of all cases in AACR and 51.2% of TCGA cases. Table summarizes key demographic differences. Amongst all cancer types, 78% of all US cancers occur in White patients however 84% and 83% of patients were White in AACR and TCGA respectively. 16% of prostate cancer cases occur in Black patients, but only 9% (n = 328/3993) of AACR and 11% (n = 51/484) of TCGA cases were Black (p < 0.01). However, while Black patients are only 2% of all breast cancer diagnoses, they accounted for 9% (n = 841/9871) of AACR and 16% (n = 203/1343) of TCGA cases p < 0.01). Patients of Hispanic ethnicity were underrepresented amongst the population and all single tumor types with Hispanics accounting for 8% of cases in USCS but only 5% in the AACR and TCGA (p < 0.01). Pancreatic and lung cancers, which have historically short survivals, both had lower median ages at sequencing compared to the median age of diagnosis. Median age of sequencing for pancreatic cancers was 65 in both TCGA and AACR, while median age at diagnosis in US is 70. Median age at diagnosis of lung cancers is 71 years in US, however median age was 67 in both AACR and TCGA. Conclusions: Patients with advanced age and minority races are underrepresented in publicly available American databases. To make informed analyses from genomic databases, the diversity of the population must be reflected in these databases and efforts must be made to increase representation of underrepresented groups. [Table: see text]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call