Abstract

BackgroundGenome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. This highlights the increasing importance of detecting and screening for rare variants. Although the massively parallel sequencing technologies have greatly reduced the cost of DNA sequencing, the identification of rare variant carriers by large-scale re-sequencing remains prohibitively expensive because of the huge challenge of constructing libraries for thousands of samples. Recently, several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost.ResultsBased on quantitative group testing, we propose an efficient overlapping pool sequencing strategy that allows the efficient recovery of variant carriers in numerous individuals with much lower costs than conventional methods. We used random k-set pool designs to mix samples, and optimized the design parameters according to an indicative probability. Based on a mathematical model of sequencing depth distribution, an optimal threshold was selected to declare a pool positive or negative. Then, using the quantitative information contained in the sequencing results, we designed a heuristic Bayesian probability decoding algorithm to identify variant carriers. Finally, we conducted in silico experiments to find variant carriers among 200 simulated Escherichia coli strains. With the simulated pools and publicly available Illumina sequencing data, our method correctly identified the variant carriers for 91.5–97.9% variants with the variant frequency ranging from 0.5 to 1.5%.ConclusionsUsing the number of reads, variant carriers could be identified precisely even though samples were randomly selected and pooled. Our method performed better than the published DNA Sudoku design and compressed sequencing, especially in reducing the required data throughput and cost.

Highlights

  • Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases

  • To further reduce the cost of large-scale screens for rare variant carriers, several techniques based on the group testing theory [8] and compressed sensing [9,10] to construct overlapping pool sequencing strategies have been used

  • We propose an efficient random overlapping pool sequencing strategy with quantitative group testing for the identification of rare variant carriers using massively parallel sequencing data

Read more

Summary

Introduction

Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. Several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost. To further reduce the cost of large-scale screens for rare variant carriers, several techniques based on the group testing theory [8] and compressed sensing [9,10] to construct overlapping pool sequencing strategies have been used. These strategies have helped decrease the sequencing times for rare variant carrier identification and further lower the cost [11,12,13,14]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call