Abstract

Recent advances in genomics technologies have spurred unprecedented efforts in genome and exome re-sequencing aiming to unravel the genetic component of rare and complex disorders. While in rare disorders this allowed the identification of novel causal genes, the missing heritability paradox in complex diseases remains so far elusive. Despite rapid advances of next-generation sequencing, both the technology and the analysis of the data it produces are in its infancy. At present there is abundant knowledge pertaining to the role of rare single nucleotide variants (SNVs) in rare disorders and of common SNVs in common disorders. Although the 1,000 genome project has clearly highlighted the prevalence of rare variants and more complex variants (e.g. insertions, deletions), their role in disease is as yet far from elucidated.We set out to analyse the properties of sequence variants identified in a comprehensive collection of exome re-sequencing studies performed on samples from patients affected by a broad range of complex and rare diseases (N = 173). Given the known potential for Loss of Function (LoF) variants to be false positive, we performed an extensive validation of the common, rare and private LoF variants identified, which indicated that most of the private and rare variants identified were indeed true, while common novel variants had a significantly higher false positive rate. Our results indicated a strong enrichment of very low-frequency insertion/deletion variants, so far under-investigated, which might be difficult to capture with low coverage and imputation approaches and for which most of study designs would be under-powered. These insertions and deletions might play a significant role in disease genetics, contributing specifically to the underlining rare and private variation predicted to be discovered through next generation sequencing.

Highlights

  • The progressively decreasing costs of generation sequencing, combined with targeted approaches such as exome sequencing, have allowed rapid deployment of this technology in a variety of contexts: population studies, familial cases of disease, as well as complex diseases

  • We processed 173 exomes from different diseases, using Novoalign, Dindel and our own annotation script based on the ENSEMBL API

  • Our single nucleotide variants (SNVs) data are consistent with those recently published by MacArthur and colleagues, here we focus our analysis on insertion/deletion variants (INDELs), far, largely under-investigated

Read more

Summary

Introduction

The progressively decreasing costs of generation sequencing, combined with targeted approaches such as exome sequencing, have allowed rapid deployment of this technology in a variety of contexts: population studies, familial cases of disease, as well as complex diseases. Exome sequencing represents a cost-effective strategy for identification of causal variants, especially in a clinical context [1] [2], where clinical information and familial history may aid in the identification of the causal genetic variant within a coding region. Several population-based studies have so far provided a general overview of variation in the human genome. The 1,000 Genomes Consortium has provided the first whole genome overview in control populations indicating that the majority of SNVs are already found in dbSNP (87.28%) [3]. Another study on exome sequencing on a control population of 200 individuals from Denmark, with an average coverage of 126 fold, showed an excess of SNPs annotated as low-frequency (2–5%) non-synonymous coding variants in a control population [5]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call