Abstract

Massively Parallel Sequencing (MPS) allows sequencing of entire exomes and genomes to now be done at reasonable cost, and its utility for identifying genes responsible for rare Mendelian disorders has been demonstrated. However, for a complex disease, study designs need to accommodate substantial degrees of locus, allelic, and phenotypic heterogeneity, as well as complex relationships between genotype and phenotype. Such considerations include careful selection of samples for sequencing and a well-developed strategy for identifying the few “true” disease susceptibility genes from among the many irrelevant genes that will be found to harbor rare variants. To examine these issues we have performed simulation-based analyses in order to compare several strategies for MPS sequencing in complex disease. Factors examined include genetic architecture, sample size, number and relationship of individuals selected for sequencing, and a variety of filters based on variant type, multiple observations of genes and concordance of genetic variants within pedigrees. A two-stage design was assumed where genes from the MPS analysis of high-risk families are evaluated in a secondary screening phase of a larger set of probands with more modest family histories. Designs were evaluated using a cost function that assumes the cost of sequencing the whole exome is 400 times that of sequencing a single candidate gene. Results indicate that while requiring variants to be identified in multiple pedigrees and/or in multiple individuals in the same pedigree are effective strategies for reducing false positives, there is a danger of over-filtering so that most true susceptibility genes are missed. In most cases, sequencing more than two individuals per pedigree results in reduced power without any benefit in terms of reduced overall cost. Further, our results suggest that although no single strategy is optimal, simulations can provide important guidelines for study design.

Highlights

  • Over the past two decades, advances in genome technology have greatly facilitated the discovery of genetic variation which confer increased susceptibility to disease, first through genetic maps of microsatellite markers which allowed mapping and positional cloning of relatively high-penetrance disease predisposing mutations in genes such as BRCA1 [MIM 113705] and BRCA2 [MIM 600185], and through the ability of high throughput, relatively low cost platforms containing 300,000– 1,000,000 single nucleotide polymorphisms (SNPs) which have greatly facilitated the identification of common genetic variants conferring only modest increases in disease risk

  • There are several ways in which such technology can identify disease associated genetic variation; through resequencing of genomic regions implicated through GWAS in the hope/expectation of identifying rarer variants associated with higher risk that are tagged by the SNPs arrayed in the GWAS platforms [2], and through the possible identification of rare high-penetrance mutations in several disease susceptibility genes

  • Our overall approach is modeled on the two-stage GWAS design in which cases and controls are assayed on a dense (500KR1M) SNP chips, followed

Read more

Summary

Introduction

Over the past two decades, advances in genome technology have greatly facilitated the discovery of genetic variation which confer increased susceptibility to disease, first through genetic maps of microsatellite markers which allowed mapping and positional cloning of relatively high-penetrance disease predisposing mutations in genes such as BRCA1 [MIM 113705] and BRCA2 [MIM 600185], and through the ability of high throughput, relatively low cost platforms containing 300,000– 1,000,000 single nucleotide polymorphisms (SNPs) which have greatly facilitated the identification of common genetic variants conferring only modest increases in disease risk. Over 800 disease susceptibility loci in ,150 human diseases/traits have been identified at genome-wide significance [1], validating this approach. While further genome-wide association studies using ever-higher density arrays, larger sample sizes and encompassing copy number variations will account for some of the missing genetic effect, it is unlikely that this proportion will increase markedly via this approach. There are several ways in which such technology can identify disease associated genetic variation; through resequencing of genomic regions implicated through GWAS in the hope/expectation of identifying rarer variants associated with higher risk that are tagged by the SNPs arrayed in the GWAS platforms [2], and through the possible identification of rare high-penetrance mutations in several disease susceptibility genes. Parallel Sequencing (MPS) provides order of magnitude improvement in throughput over Sanger sequencing enabling ‘‘genome-wide’’ sequencing applications in single sample preparations

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call