Abstract

BackgroundDue to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population.MethodsIn this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population.ResultsWe detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine.ConclusionWe identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.

Highlights

  • Genetics is the key to deciphering the phenotypic diversity of different populations

  • Through comparative analyses with genetic variants detected by HapMap and 1KGP, we identified the single nucleotide variant (SNV) only contained in the Korean population and explored their associations with functional pathways, disease terms and drug terms

  • Read alignment and SNV calling A total of 3.8 TB raw whole genome sequencing data from 35 Korean samples was acquired from Korean Personal Genomes Project (KPGP), with depth in each sample ranging from 30X to 42X (Additional file 2)

Read more

Summary

Introduction

Genetics is the key to deciphering the phenotypic diversity of different populations. Genome-wide association studies (GWAS) have been utilized to identify common variants associated with different phenotypes among populations [5,6,7]. The HapMap Project [21] and the 1000 Genomes Project (1KGP) [22] are two large efforts at deciphering the variants in various populations by microarray and NGS, respectively. More than 38 million variants were identified from 1,887 normal samples of 17 populations by HapMap [23,24] and 1KGP [25,26], providing comprehensive references of variants for genetic studies. Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call