Abstract

BackgroundSequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data.ResultsTagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences.ConclusionsTagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner.

Highlights

  • Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets

  • Application examples In the first application example, TagCleaner was applied to three metagenomic datasets available in FASTA format (Table 2)

  • We presented a web-based program that implements several features to improve the pre-processing of the data

Read more

Summary

Introduction

Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Scientific interest in environmental microbial and viral communities is growing with every year. Metagenomics is an approach widely used to characterize microbial and viral communities for ecological studies and viral discovery across a wide range of environments such as marine, insects, plants, animals, and human [1,2,3,4]. Major steps of a typical sequence processing pipeline include sequence cleaning, fragment assembly, clustering, taxonomic assignment, and estimation of the community composition. The sequence cleaning step usually includes filtering of duplicated reads, short reads, low quality reads, contaminations, and reads containing ambiguous bases (N) above a certain threshold

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call