Abstract Prompted by the considerable biases toward European ancestry cases in large cancer genomics repositories, the New York Genome Center initiated an effort to diversify the public datasets that are forming the foundations of current cancer research, in order to optimize the unbiased applications of modern genomically-driven discoveries to a larger population. Our effort was focused on 8 cancer types (breast, prostate, bladder, lung, pancreatic, colon and endometrial cancer, as well as multiple myeloma) and generated Illumina tumor-normal whole-genome sequencing and tumor transcriptome sequencing. The majority of the samples were retrospectively obtained in clinics and hospitals of the New York region and the analyses are conducted with collaborators in these institutions. Our dataset is currently composed of 851 tumor-normal WGS and 471 RNASeq, and is progressively being distributed to the scientific community in partnership with ICGC-ARGO. Overall, 75% of the cohort is predominantly of non-European ancestry, as estimated by genetic ancestry analysis using the 1000Genomes continental references.We analyzed the samples with NYGC’s somatic variant calling pipeline and will present results from the 8 cancer types related to germline and somatic variants (single-nucleotide variants, indels, copy-number alterations, complex structural variants) as well as mutational signatures, neoantigen prediction, fusion genes and expression profiles, in relation to genetic ancestry estimation and clinical features of the patients. We anticipate that the dataset will significantly diversify the genetic ancestry of tumor profiles in public databases, and serve to form hypotheses regarding the contribution of ancestry on cancer disparities. The experience of the Polyethnic-1000 project informs our contribution to the Cancer Grand Challenge SAMBAI, which focuses on breast, prostate and pancreatic cancers in the African Diaspora. SAMBAI will further investigate the causes of cancer disparity by combining genomic data with social determinants of health and exposomics information, deeper immune profiling and new genomics technologies such as long reads sequencing and the use of a pangenome reference. Citation Format: Nicolas Robine, Lara Winterkorn, Tim Chu, Zoe Goldstein, Will Hooper, Ali Oku, Heather M. Geiger, Melissa B. Davis, Onyinye Balogun. Polyethnic-1000: Advancing cancer genomics by studying ethnically diverse, underserved patient populations in New York [abstract]. In: Proceedings of the 17th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2024 Sep 21-24; Los Angeles, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2024;33(9 Suppl):Abstract nr C102.
Read full abstract