Abstract

Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism.

Highlights

  • Assessing the genetic differences between individuals within a species or between chromosomes of an individual is a fundamental task in many aspects of biology

  • Results presented in this paper show that DISCOSNP outperforms other reference-free single nucleotide polymorphisms (SNPs) detection methods in terms of resources, type and number of input dataset(s), and quality of the ranking of predicted isolated SNPs

  • We propose experiments that aim at (i) assessing the quality of DISCOSNP results on simulated datasets, in comparison with state-of-the-art reference-free SNP detection methods; (ii) showing how DISCOSNP performs on real data, with biological validation

Read more

Summary

Introduction

Assessing the genetic differences between individuals within a species or between chromosomes of an individual is a fundamental task in many aspects of biology. This is increasingly feasible with next-generation sequencing technologies, as individuals from virtually any species can be sequenced at a modest cost. To be amplified by polymerase chain reaction (PCR), such SNPs must not be surrounded by other polymorphism sources, i.e. other SNPs, indels or structural variants. Isolated SNPs must be distant to the left and to the right by at least k nucleotides from any other polymorphism, k being one of the main parameters of a SNP detection tool

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.