Abstract

BackgroundThe main limitations in the analysis of viral metagenomes are perhaps the high genetic variability and the lack of information in extant databases. To address these issues, several bioinformatic tools have been specifically designed or adapted for metagenomics by improving read assembly and creating more sensitive methods for homology detection. This study compares the performance of different available assemblers and taxonomic annotation software using simulated viral-metagenomic data.ResultsWe simulated two 454 viral metagenomes using genomes from NCBI's RefSeq database based on the list of actual viruses found in previously published metagenomes. Three different assembly strategies, spanning six assemblers, were tested for performance: overlap-layout-consensus algorithms Newbler, Celera and Minimo; de Bruijn graphs algorithms Velvet and MetaVelvet; and read probabilistic model Genovo. The performance of the assemblies was measured by the length of resulting contigs (using N50), the percentage of reads assembled and the overall accuracy when comparing against corresponding reference genomes. Additionally, the number of chimeras per contig and the lowest common ancestor were estimated in order to assess the effect of assembling on taxonomic and functional annotation. The functional classification of the reads was evaluated by counting the reads that correctly matched the functional data previously reported for the original genomes and calculating the number of over-represented functional categories in chimeric contigs. The sensitivity and specificity of tBLASTx, PhymmBL and the k-mer frequencies were measured by accurate predictions when comparing simulated reads against the NCBI Virus genomes RefSeq database.ConclusionsAssembling improves functional annotation by increasing accurate assignations and decreasing ambiguous hits between viruses and bacteria. However, the success is limited by the chimeric contigs occurring at all taxonomic levels. The assembler and its parameters should be selected based on the focus of each study. Minimo's non-chimeric contigs and Genovo's long contigs excelled in taxonomy assignation and functional annotation, respectively.tBLASTx stood out as the best approach for taxonomic annotation for virus identification. PhymmBL proved useful in datasets in which no related sequences are present as it uses genomic features that may help identify distant taxa. The k-frequencies underperformed in all viral datasets.

Highlights

  • The main limitations in the analysis of viral metagenomes are perhaps the high genetic variability and the lack of information in extant databases

  • The taxonomic frequencies for viruses in the metagenome simulations (Figure 1A) were consistent with those resulting from the taxonomic classification performed by Reyes and collaborators [12]

  • This was expected since the same taxonomic method, based on tBLASTx [43] with an e-value < 0.001, was used for determining the taxonomic frequencies, and the Viral genome database from the NCBI, used for mapping the reads into reference genomes, shares most entries from the VLP database used by Reyes

Read more

Summary

Introduction

The main limitations in the analysis of viral metagenomes are perhaps the high genetic variability and the lack of information in extant databases. Following the emergence of metagenomics, its quick development responded to the popularization of next-generation platforms These enabled faster and higher throughput approaches to accurately describe the genetic diversity and elucidate the complex relationships existing between the organisms in different ecological niches. It has been determined that the core of the intestinal microbiome is constituted by a definite number of nearly ubiquitous species that show a high variability in terms of abundance [8,10] and that this core of species is not shared between close relatives [1,7] Even though this has been thoroughly explored, as much as 75% of the predicted open reading frames from metagenomic analyses fail to be assigned a function [8]

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.