Abstract

BackgroundOligonucleotide signatures (signatures) have been widely used for studying microbial diversity and function in wet-lab settings, but using them for accurate in silico identification of organisms from high-throughput sequencing (HTS) data is only a proof of concept. Existing signature design programs for sequence signatures (signatures matching exactly one sequence) or clade signatures (signatures matching every sequence in a phylogenetic clade) are not able to identify all possible polymorphic sites for sequences with high similarity and perform poorly when handling large genome sequencing datasets.ResultsWe introduce cluster signatures: subsequences that match perfectly and exclusively any group of sequences in a data set. Cluster signatures provide complete recall for primer/probe design and increased discrimination between sequences beyond that of clade signatures. Using cluster signatures for in silico identification of HTS targets achieves good precision/recall and running time performance. This method has been implemented into an open source tool, the Automated Oligonucleotide Design Pipeline (adop), included in supplementary material and available at: https://bitbucket.org/wenchen_aafc/aodp_v2.0_release.ConclusionsCluster signatures provide a rapid and universal analysis tool to identify all possible short diagnostic DNA markers and variants from any DNA sequencing dataset. They are particularly useful in discriminating genetic material from closely related organisms and in detecting deleterious mutations in highly or perfectly conserved genomic sites.

Highlights

  • Oligonucleotide signatures have been widely used for studying microbial diversity and function in wet-lab settings, but using them for accurate in silico identification of organisms from high-throughput sequencing (HTS) data is only a proof of concept

  • In this study, we evaluated the statistical properties of cluster signatures and their use for mass identification by sequencing

  • Our method is universal as it can find oligonucleotide signatures for unique strains, species, higher level phylogenetic clades or mutations linked to genetic diseases or genetic abnormalities

Read more

Summary

Introduction

Oligonucleotide signatures (signatures) have been widely used for studying microbial diversity and function in wet-lab settings, but using them for accurate in silico identification of organisms from high-throughput sequencing (HTS) data is only a proof of concept. Biodiversity research and survey require accurate identification of organisms from the environment, especially those of public concerns, e.g. quarantine species and select agents monitored by national biosafety and biosecurity programs. Identifying the sequences, e.g. DNA markers or genome regions, of concern in ecosystems is the fundamental strategy [1], especially in the metagenomics era which requires high-throughput processing without compromising accuracy and sensitivity. A recent study using the same classifier could classify metabarcodes of the 16rRNA genes to family and genus levels with accuracy 75% or lower [25]. The internal transcribed spacer 1 (ITS1) of Tilletia indica, a quarantine pathogen in many countries, and T. walkeri which is not regulated by most countries except for South Korea, differ only by two bases

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call