Abstract

Genetic variation in populations of Middle Eastern origin remains highly underrepresented in most comprehensive genomic databases. This underrepresentation hampers the functional annotation of the human genome and challenges accurate clinical variant interpretation. To highlight the importance of capturing genetic variation in the Middle East, we aggregated whole exome and genome sequencing data from 2116 individuals in the Middle East and established the Middle East Variation (MEV) database. Of the high-impact coding (missense and loss of function) variants in this database, 53% were absent from the most comprehensive Genome Aggregation Database (gnomAD), thus representing a unique Middle Eastern variation dataset which might directly impact clinical variant interpretation. We highlight 39 variants with minor allele frequency >1% in the MEV database that were previously reported as rare disease variants in ClinVar and the Human Gene Mutation Database (HGMD). Furthermore, the MEV database consisted of 281 putative homozygous loss of function (LoF) variants, or complete knockouts, of which 31.7% (89/281) were absent from gnomAD. This set represents either complete knockouts of 83 unique genes in reportedly healthy individuals, with implications regarding disease penetrance and expressivity, or might affect dispensable exons, thus refining the clinical annotation of those regions. Intriguingly, 24 of those genes have several clinically significant variants reported in ClinVar and/or HGMD. Our study shows that genetic variation in the Middle East improves functional annotation and clinical interpretation of the genome and emphasizes the need for expanding sequencing studies in the Middle East and other underrepresented populations.

Highlights

  • Cataloguing human genetic variation at an unprecedented scale has significantly improved the clinical interpretation of genetic variants found in patients with Mendelian disorders [1]

  • This study demonstrates the importance of capturing genetic variation in the Middle East and highlights the integration of different variant datasets to improve the clinical annotation of the human genome

  • We focus on the set of highimpact coding (missense, stop gain/loss, splice acceptor/donor (±1, 2), frameshift) variants (n = 600,987) affecting RefSeq transcripts/exons (Methods), given such variants represent the majority of disease variants [11]

Read more

Summary

Introduction

Cataloguing human genetic variation at an unprecedented scale has significantly improved the clinical interpretation of genetic variants found in patients with Mendelian disorders [1]. The 1000 Genomes Project created a catalogue of human genetic variations applying whole-exome sequencing (WES) and whole-genome sequencing (WGS) on 2504 individuals from 26 different populations [2] This project characterized over 88 million variants in the human genome, including >99% of single nucleotide variants (SNVs), with a frequency of >1% for a variety of ancestries. Large-scale reference data sets established by the Exome Aggregation Consortium (ExAC) [4], aggregating 60,706 exome sequences, provided a more comprehensive summary of human genome variations; later, the Genome Aggregation Database (gnomAD) aggregated 125,748 exome sequences in addition to 15,708 whole-genome sequences of unrelated individuals from various ancestries These publicly available datasets are beneficial for use by the clinical and scientific community. This study demonstrates the importance of capturing genetic variation in the Middle East and highlights the integration of different variant datasets to improve the clinical annotation of the human genome

Study Cohort
Middle East Variation (MEV) Database
Class I
Class II
Common Middle East Disease Variants (CMEDVs)
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call