Abstract

BackgroundThe populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. We present the first high-resolution copy number variation (CNV) map for a Gulf Arab population, using a hybrid approach that integrates array genotyping intensity data and next-generation sequencing reads to call CNVs in the Qatari population.MethodsCNVs were detected in 97 unrelated Qatari individuals by running two calling algorithms on each of two primary datasets: high-resolution genotyping (Illumina Omni 2.5M) and high depth whole-genome sequencing (Illumina PE 100bp). The four call-sets were integrated to identify high confidence CNV regions, which were subsequently annotated for putative functional effect and compared to public databases of CNVs in other populations. The availability of genome sequence was leveraged to identify tagging SNPs in high LD with common deletions in this population, enabling their imputation from genotyping experiments in the future.ResultsGenotyping intensities and genome sequencing data from 97 Qataris were analyzed with four different algorithms and integrated to discover 16,660 high confidence CNV regions (CNVRs) in the total population, affecting ~28 Mb in the median Qatari genome. Up to 40 % of all CNVs affected genes, including novel CNVs affecting Mendelian disease genes, segregating at different frequencies in the 3 major Qatari subpopulations, including those with Bedouin, Persian/South Asian, and African ancestry. Consistent with high consanguinity levels in the Bedouin subpopulation, we found an increased burden for homozygous deletions in this group. In comparison to known CNVs in the comprehensive Database of Genomic Variants, we found that 5 % of all CNVRs in Qataris were completely novel, with an enrichment of CNVs affecting several known chromosomal disorder loci and genes known to regulate sugar metabolism and type 2 diabetes in the Qatari cohort. Finally, we leveraged the availability of genome sequence to find suitable tagging SNPs for common deletions in this population.ConclusionWe combine four independently generated datasets from 97 individuals to study CNVs for the first time at high-resolution in a Gulf Arab population.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1991-5) contains supplementary material, which is available to authorized users.

Highlights

  • The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations

  • We present this study as the first systematic approach that makes available data on a large number of copy number variation (CNV) detected from both genotyping and sequencing platforms in this previously under-explored population, and believe the frequency of CNVs reported in this study could be used to assess the pathogenicity of clinically detected CNVs in Gulf Arab patients that appear novel to public databases

  • This kind of population specificity will have detrimental effects on the design of population-specific clinical CHG-arrays, and on the interpretation and assignment of pathogenicity of CNV findings in individuals from different ethnic populations [25, 51,52,53,54]. This issue is of special relevance to Arabs in general and Qataris for whom there is a paucity of databases of CNVs observed in healthy controls

Read more

Summary

Introduction

The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. The Qatari peninsula, located on the eastern coast of the Arabian Peninsula, is at a major crossroads of human migration [1]. This geographical location has led to several waves of settlement over the past millennia, creating unique ethnic ancestries that form the present day’s population. Whereas classic Mendelian disorders are generally defined by single nucleotide polymorphism’s (SNPs), there is increasing evidence that copy number variation (CNV) play a significant role in disease [6,7,8,9]. Though common CNVs have traditionally shown relatively poor association with common disease [12], intermediate frequency (1-10 %) and rare (

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.