Abstract

There have been a number of recent successes in the use of whole genome sequencing and sophisticated bioinformatics techniques to identify pathogenic DNA sequence variants responsible for individual idiopathic congenital conditions. However, the success of this identification process is heavily influenced by the ancestry or genetic background of a patient with an idiopathic condition. This is so because potential pathogenic variants in a patient’s genome must be contrasted with variants in a reference set of genomes made up of other individuals’ genomes of the same ancestry as the patient. We explored the effect of ignoring the ancestries of both an individual patient and the individuals used to construct reference genomes. We pursued this exploration in two major steps. We first considered variation in the per-genome number and rates of likely functional derived (i.e., non-ancestral, based on the chimp genome) single nucleotide variants and small indels in 52 individual whole human genomes sampled from 10 different global populations. We took advantage of a suite of computational and bioinformatics techniques to predict the functional effect of over 24 million genomic variants, both coding and non-coding, across these genomes. We found that the typical human genome harbors ∼5.5–6.1 million total derived variants, of which ∼12,000 are likely to have a functional effect (∼5000 coding and ∼7000 non-coding). We also found that the rates of functional genotypes per the total number of genotypes in individual whole genomes differ dramatically between human populations. We then created tables showing how the use of comparator or reference genome panels comprised of genomes from individuals that do not have the same ancestral background as a patient can negatively impact pathogenic variant identification. Our results have important implications for clinical sequencing initiatives.

Highlights

  • Whole genome sequencing (WGS) has enabled the search for inherited DNA sequence variants that are responsible for idiopathic diseases affecting a number of individuals (Biesecker et al, 2009; Lupski et al, 2010; Roach et al, 2010; Bainbridge et al, 2011; Worthey et al, 2011; Lyon and Wang, 2012)

  • VARIANT IDENTIFICATION From the 52 individual genomes we identified 24,277,549 “non-reference” variants that deviated from build hg18 of the human reference genome represented in the UCSC browser (Mangan et al, 2009; Fujita et al, 2011)

  • SIMULATION STUDY RESULTS USING KNOWN PATHOGENIC VARIANTS As emphasized in the Introduction, two factors go into the inference that a variant is likely to be pathogenic and causative of an idiopathic condition: the variant must be unique to the patient with the condition (i.e., “novel”) and it must be predicted to be functional

Read more

Summary

Introduction

Whole genome sequencing (WGS) has enabled the search for inherited DNA sequence variants that are responsible for idiopathic diseases affecting a number of individuals (Biesecker et al, 2009; Lupski et al, 2010; Roach et al, 2010; Bainbridge et al, 2011; Worthey et al, 2011; Lyon and Wang, 2012). The strategy for identifying such variants is intuitive, as it involves two reasonable assumptions: first, that the responsible variants are unique to the individuals affected by the diseases and second, that these variants are likely to exhibit molecular effects pronounced enough to be captured by available bioinformatic analyses of those variants (Rope et al, 2011; Yandell et al, 2011). This strategy is not necessarily trivial to implement. Many diseases, idiopathic or not, have a complex molecular basis even though they exhibit pronounced clinical phenotypic expressions and severe health consequences that on the surface appear to be due to a singular, rare or overtly monogenic genomic perturbation and may require a different strategy for identifying their genetic determinants (Biesecker et al, 2009; Gonzaga-Jauregui et al, 2012)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call