Abstract

Identifying single nucleotide polymorphism (SNPs) from pooled samples is critical for many studies and applications. SNPs determined by next-generation sequencing results may suffer from errors in both base calling and read mapping. Taking advantage of dual mononucleotide addition-based pyrosequencing, we present Epds, a method to efficiently identify SNPs from pooled DNA samples. On the basis of only five patterns of non-synchronistic extensions between the wild and mutant sequences using dual mononucleotide addition-based pyrosequencing, we employed an enumerative algorithm to infer the mutant locus and estimate the proportion of mutant sequence. According to the profiles resulting from three runs with distinct dual mononucleotide additions, Epds could recover the mutant bases. Results showed that our method had a false-positive rate of less than 3%. Series of simulations revealed that Epds outperformed the current method (PSM) in many situations. Finally, experiments based on profiles produced by real sequencing proved that our method could be successfully applied for the identification of mutants from pooled samples. The software for implementing this method and the experimental data are available at http://bioinfo.seu.edu.cn/Epds.

Highlights

  • A single nucleotide polymorphism (SNP) is a variation among individuals at a single position in a DNA sequence

  • By means of dual mononucleotide additionbased sequencing technique, we presented Epds, a method to identify SNPs from pooled DNA samples

  • Based on only five patterns of non-synchronistic extensions between wild and mutant sequences when dual mononucleotide additionbased sequencing was used, we proposed an enumerative algorithm to infer the mutant locus and estimate the proportion of mutant sequence

Read more

Summary

Introduction

A single nucleotide polymorphism (SNP) is a variation among individuals at a single position in a DNA sequence. SNPs are the most widely used molecular markers in many genetic studies due to their abundance and the high potential for automation (Kumar et al 2012). SNPs occur every 1000–2000 bases when two human chromosomes are compared (Sachidanandam et al 2001; Sherry et al 2001). SNPs are becoming powerful tools for identifying genetic factors and have been applied in many fields (Liao and Lee 2010), including human forensics and diagnostics (Brenner and Weir 2003; Mccarthy et al 2008), animal and crop breeding (Lagudah et al 2009; Schennink et al 2009), and biomolecule production improvement (Lee and Lee 2005; Snyder and Francis 2005; Nijland et al 2007). SNPs determined from sequencing results may suffer from incorrect base calling and misaligned reads (Nielsen et al 2011). Prior to any application of the SNP data, the discovered SNPs must be validated to identify the true SNPs

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call