Abstract

BackgroundGenotype imputation from single-nucleotide polymorphism (SNP) genotype data using a haplotype reference panel consisting of thousands of unrelated individuals from populations of interest can help to identify strongly associated variants in genome-wide association studies. The Tohoku Medical Megabank (TMM) project was established to support the development of precision medicine, together with the whole-genome sequencing of 1070 human genomes from individuals in the Miyagi region (Northeast Japan) and the construction of the 1070 Japanese genome reference panel (1KJPN). Here, we investigated the performance of 1KJPN for genotype imputation of Japanese samples not included in the TMM project and compared it with other population reference panels.ResultsWe found that the 1KJPN population was more similar to other Japanese populations, Nagahama (south-central Japan) and Aki (Shikoku Island), than to East Asian populations in the 1000 Genomes Project other than JPT, suggesting that the large-scale collection (more than 1000) of Japanese genomes from the Miyagi region covered many of the genetic variations of Japanese in mainland Japan. Moreover, 1KJPN outperformed the phase 3 reference panel of the 1000 Genomes Project (1KGPp3) for Japanese samples, and IKJPN showed similar imputation rates for the TMM and other Japanese samples for SNPs with minor allele frequencies (MAFs) higher than 1%.Conclusions1KJPN covered most of the variants found in the samples from areas of the Japanese mainland outside the Miyagi region, implying 1KJPN is representative of the Japanese population’s genomes. 1KJPN and successive reference panels are useful genome reference panels for the mainland Japanese population. Importantly, the addition of whole genome sequences not included in the 1KJPN panel improved imputation efficiencies for SNPs with MAFs under 1% for samples from most regions of the Japanese archipelago.

Highlights

  • Genotype imputation from single-nucleotide polymorphism (SNP) genotype data using a haplotype reference panel consisting of thousands of unrelated individuals from populations of interest can help to identify strongly associated variants in genome-wide association studies

  • [1] The choice of a haplotype reference panel to maximize imputation performance has often been debated. [2,3,4] Haplotype reference panels are used to identify haplotypes of individual genomes genotyped by single-nucleotide polymorphism (SNP) arrays, and to estimate the genotypes missing in the SNP array data

  • Genetic diversity of Japanese and other east Asian populations We compared the diversity of Japanese populations with the diversity of populations from elsewhere in East Asia to determine how 1KJPN might reflect these populations

Read more

Summary

Introduction

Genotype imputation from single-nucleotide polymorphism (SNP) genotype data using a haplotype reference panel consisting of thousands of unrelated individuals from populations of interest can help to identify strongly associated variants in genome-wide association studies. We investigated the performance of 1KJPN for genotype imputation of Japanese samples not included in the TMM project and compared it with other population reference panels. To enable high-density genotype imputation for SNPs with minor allele frequencies (MAFs) > 1% in a population, reference panels are constructed preferably based on the whole-genome sequencing (WGS) of large samples. [2, 6] These studies suggest that increasing the sample sizes of population-specific haplotype reference panels is more effective for improving genotype imputation accuracy than aggregating the haplotype collection from worldwide resources, because the focus is on specific populations. Recent studies in human population genetics have revealed clear regional variation in haplotype diversity, even within a single population, [7] the influence of such variation on imputation performance has not yet been assessed

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call