Abstract

BackgroundTaxonomic identification of plants and insects is a hard process that demands expert taxonomists and time, and it’s often difficult to distinguish on morphology only. DNA barcodes allow a rapid species discovery and identification and have been widely used for taxonomic identification by targeting known gene regions that permit to discriminate these species. DNA barcode sequence analysis is usually carried out with processes and tools that still demand a high interaction with the user or researcher. To reduce at most such interaction, we proposed PIPEBAR, a pipeline for DNA chromatograms analysis of Sanger platform sequencing, ensuring high quality consensus sequences along with efficient running time. We also proposed a paired-end reads assembly tool, OverlapPER, which is used in sequence or independently of PIPEBAR.ResultsPIPEBAR is a command line tool to automatize the processing of large number of trace files. It is accurate as the proprietary Geneious tool and faster than most popular software for barcoding analysis. It is 7 times faster than Geneious and 14 times faster than SeqTrace for processing hundreds of barcoding sequences. OverlapPER is a novel tool for overlapping paired-end reads accurately that accepts both substitution and indel errors and returns both overlapped and non-overlapped regions between a pair of reads. OverlapPER obtained the best results compared to currently used tools when merging 1,000,000 simulated paired-end reads.ConclusionsPIPEBAR and OverlapPER run on most operating systems and are freely available, along with supporting code and documentation, at https://sourceforge.net/projects/PIPEBAR/ and https://sourceforge.net/projects/overlapper-reads/.

Highlights

  • Taxonomic identification of plants and insects is a hard process that demands expert taxonomists and time, and it’s often difficult to distinguish on morphology only

  • We make available an additional step for stop-codons and frameshift corrections for the final sequences assembled by PIPEBAR that are originated from coding regions, facilitating the submission of such sequences to barcode databases, such as Barcode of life database (BOLD) and National center for biotechnology information (NCBI) [29]

  • The further sessions of the paper are organized as it follows: Implementation, where we show how PIPEBAR and OverlapPER operate; Results and Discussion, where we tested both PIPEBAR and OverlapPER and showed the obtained results, along with its discussion; and Conclusion, where we summarize the tools presented and how important they can be to the scientific community

Read more

Summary

Results

PIPEBAR is a command line tool to automatize the processing of large number of trace files. It is accurate as the proprietary Geneious tool and faster than most popular software for barcoding analysis. It is 7 times faster than Geneious and 14 times faster than SeqTrace for processing hundreds of barcoding sequences. OverlapPER is a novel tool for overlapping paired-end reads accurately that accepts both substitution and indel errors and returns both overlapped and non-overlapped regions between a pair of reads. OverlapPER obtained the best results compared to currently used tools when merging 1,000,000 simulated paired-end reads

Background
Results and discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call