Abstract

The diversity in our genome is crucial to understanding the demographic history of worldwide populations. However, we have yet to know whether subtle genetic differences within a population can be disentangled, or whether they have an impact on complex traits. Here we apply dimensionality reduction methods (PCA, t-SNE, PCA-t-SNE, UMAP, and PCA-UMAP) to biobank-derived genomic data of a Japanese population (n = 169,719). Dimensionality reduction reveals fine-scale population structure, conspicuously differentiating adjacent insular subpopulations. We further enluciate the demographic landscape of these Japanese subpopulations using population genetics analyses. Finally, we perform phenome-wide polygenic risk score (PRS) analyses on 67 complex traits. Differences in PRS between the deconvoluted subpopulations are not always concordant with those in the observed phenotypes, suggesting that the PRS differences might reflect biases from the uncorrected structure, in a trait-dependent manner. This study suggests that such an uncorrected structure can be a potential pitfall in the clinical application of PRS.

Highlights

  • 1234567890():,; The diversity in our genome is crucial to understanding the demographic history of worldwide populations

  • In a phenotype-dependent manner, our results demonstrate that such biases in polygenic risk score (PRS) would not be fully correctable even with pre-detection of the cryptic structures

  • We considered that the substructures identified by Principal component analysis (PCA)–uniform manifold approximation and projection (UMAP) were concordant with those identified by fineSTRUCTURE

Read more

Summary

Introduction

1234567890():,; The diversity in our genome is crucial to understanding the demographic history of worldwide populations. Principal component analysis (PCA), a classical dimensionality reduction method, has been a method of choice to uncover the large population structure[7,8] This linear transformation, was not sufficient to fully capture the fine and subtle genomic structure. UMAP and its combination with PCA (PCA–UMAP) are computationally fast and scalable for application to large genomic datasets[12] These novel dimensionality reduction methods should be applied to the relatively understudied diverse populations worldwide to uncover the unknown fine-scale structure. Few admixture events have taken place after these migratory waves, and the population has been kept isolated within the mainland and the surrounding thousands of small islands These unique situations represent an ideal scenario for the investigation of the fine-scale structure of neighboring yet isolated regions, which might be in contrast to admixed populations living on a continent

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.