Abstract

We describe a method for pooling and sequencing DNA from a large number of individual samples while preserving information regarding sample identity. DNA from 576 individuals was arranged into four 12 row by 12 column matrices and then pooled by row and by column resulting in 96 total pools with 12 individuals in each pool. Pooling of DNA was carried out in a two-dimensional fashion, such that DNA from each individual is present in exactly one row pool and exactly one column pool. By considering the variants observed in the rows and columns of a matrix we are able to trace rare variants back to the specific individuals that carry them. The pooled DNA samples were enriched over a 250 kb region previously identified by GWAS to significantly predispose individuals to lung cancer. All 96 pools (12 row and 12 column pools from 4 matrices) were barcoded and sequenced on an Illumina HiSeq 2000 instrument with an average depth of coverage greater than 4,000×. Verification based on Ion PGM sequencing confirmed the presence of 91.4% of confidently classified SNVs assayed. In this way, each individual sample is sequenced in multiple pools providing more accurate variant calling than a single pool or a multiplexed approach. This provides a powerful method for rare variant detection in regions of interest at a reduced cost to the researcher.

Highlights

  • Genome wide association studies (GWAS) provide a wealth of information about the genetic basis of disease

  • As regions of the genome that are involved in pathogenesis are identified there is a need for improved fine mapping of genetic variants associated with disease over a large number of individuals

  • Targeted enrichment of specific regions of interest prior to pooling can increase the number of samples processed using current sequencing technologies. Bioinformatics tools such as VarScan and CRISP exist for single nucleotide variant (SNV) calling from pooled samples but are not capable of identifying the specific samples in the pool that contributed the variant [1] [2]

Read more

Summary

Introduction

Genome wide association studies (GWAS) provide a wealth of information about the genetic basis of disease. As regions of the genome that are involved in pathogenesis are identified there is a need for improved fine mapping of genetic variants associated with disease over a large number of individuals. Sample pooling is a frequently applied method for sequencing a large number of samples in order to detect variants. Targeted enrichment of specific regions of interest prior to pooling can increase the number of samples processed using current sequencing technologies. Bioinformatics tools such as VarScan and CRISP exist for single nucleotide variant (SNV) calling from pooled samples but are not capable of identifying the specific samples in the pool that contributed the variant [1] [2]. Improved methods are required to enable degrees of sample deconvolution for DNA that is pooled prior to library preparation for sequencing

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call