Elucidating ancestry-specific structures in admixed populations is crucial for comprehending population history and mitigating confounding effects in genome-wide association studies. Existing methods for elucidating the ancestry-specific structures generally rely on frequency-based estimates of genetic relationship matrix (GRM) among admixed individuals after masking segments from ancestry components not being targeted for investigation. However, these approaches disregard linkage information between markers, potentially limiting their resolution in revealing structure within an ancestry component. We introduce ancestry-specific expected GRM (as-eGRM), a novel framework for elucidating the relatedness within ancestry components between admixed individuals. The key design of as-eGRM consists of defining ancestry-specific pairwise relatedness between individuals based on genealogical trees encoded in the Ancestral Recombination Graph (ARG) and local ancestry calls and computing the expectation of the ancestry-specific relatedness across the genome. Comprehensive evaluations using both simulated stepping-stone models of population structure and empirical datasets based on three-way admixed Latino cohorts showed that analysis based on as-eGRM robustly outperforms existing methods in revealing the structure in admixed populations with diverse demographic histories. Taken together, as-eGRM has the promise to better reveal the fine-scale structure within an ancestry component of admixed individuals, which can help improve the robustness and interpretation of findings from association studies of disease or complex traits for these understudied populations.
Read full abstract