Abstract 95% of sequencing data is from individuals of European descent. Underrepresentation of non- European individuals in genomic resources has limited our understanding of the effects of population-specific variants on complex traits and disease. Human populations have undergone selection for variants beneficial to survival in their environment, resulting in individuals with genetic variants unique to their ancestral populations. While these variants can impact disease occurrence and treatment outcome, genetic ancestry is often unaccounted for in disease modeling. Lack of ancestral population representation in data has led to disparate disease outcomes and treatment. While the primary proposed solution to this problem is more diverse sequencing of patients, these efforts will take years of effort and continual work to maintain the diversity balance. Additionally, they will not solve the issue; clinical trials cannot expect to capture human global diversity at every stage. Here we show we can mitigate data bias by utilizing readily available genomic population databases to target genes with ancestrally relevant variations. Our AI model, PhyloFrame, creates equitable gene signatures from transcriptomic input data. PhyloFrame corrects for ancestral bias in data by utilizing genomic population databases and functional interaction networks to identify genes both relevant to disease and associated with a variant unique to a single ancestry. PhyloFrame uses gnomAD data to quantify genome-wide ancestry-specific allele frequencies from a large cohort of healthy individuals, which PhyloFrame uses to create ancestry-agnostic signatures when modeling disease data. Our model demonstrates the utility of including readily available population-level data from healthy individuals in AI methods to overcome existing genomic data bias. PhyloFrame serves also as a demonstration of how equitable AI methods create gene signatures applicable across ancestries without requiring sample-specific ancestry information. Our model demonstrates how AI can be used to help mitigate ancestral bias in genomic data using preexisting data and contribute to equitable representation in medical research. Citation Format: Leslie A. Smith, Kiley Graim, James A. Cahill. Prioritization of ancestry-specific variations in disease data helps to alleviate bias in genomic data [abstract]. In: Proceedings of the 17th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2024 Sep 21-24; Los Angeles, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2024;33(9 Suppl):Abstract nr B012.
Read full abstract