Abstract

DNA variants, such as single nucleotide polymorphisms (SNPs) and copy number variants (CNVs), are unevenly distributed across the human genome. Currently, dbSNP contains more than 6 million human SNPs, and whole-genome genotyping arrays can assay more than 4 million of them simultaneously. In our study, we first questioned whether published genome-wide association studies (GWASs) assays cover all regions well in the genome. Using dbSNP build 135 data, we identified 50 genomic regions longer than 100 Kb that do not contain any common SNPs, i.e., those with minor allele frequency (MAF)≥1%. Secondly, because conserved regions are generally of functional importance, we tested genes in those large genomic regions without common SNPs. We found 97 genes and were enriched for reproduction function. In addition, we further filtered out regions with CNVs listed in the Database of Genomic Variants (DGV), segmental duplications from Human Genome Project and common variants identified by personal genome sequencing (UCSC). No region survived after those filtering. Our analysis suggests that, while there may not be many large genomic regions free of common variants, there are still some “holes” in the current human genomic map for common SNPs. Because GWAS only focused on common SNPs, interpretation of GWAS results should take this limitation into account. Particularly, two recent GWAS of fertility may be incomplete due to the map deficit. Additional SNP discovery efforts should pay close attention to these regions.

Highlights

  • The human genome contains millions of common single nucleotide polymorphisms (SNPs), which are being deposited into public databases

  • We identified 50 common SNP-free regions (CSFRs) distributed across eight chromosomes: chr1, chr2, chr7, chr9, chr10, chr16, chrX, and chrY

  • We performed a thorough search for large genomic regions that are free of common variants in dbSNP and we found 50 CSFRs and 20 common variant-free regions (CVFRs)

Read more

Summary

Introduction

The human genome contains millions of common SNPs, which are being deposited into public databases. These data have been used to design genome-wide association studies (GWASs) [1,2,3]. Genomic regions not covered by common variants are neglected. Those neglected regions may contain variants with low frequencies, and should be paid more attention to because rare variants are even more likely to be functional than common ones [5]. We were interested in two questions: 1) whether the human genome is sufficiently covered by common SNPs and is sufficiently captured by common SNPs of standard GWAS platforms, and 2) whether any genes were included in those regions and their enriched biological functions

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call