MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes

Carlos A Ruiz-Perez,Konstantinos T Konstantinidis,Roth E Conrad

doi:10.1186/s12859-020-03940-5

Carlos A Ruiz-Perez, Konstantinos T Konstantinidis + Show 1 more

Open Access

https://doi.org/10.1186/s12859-020-03940-5

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jan 6, 2021
Citations: 72	License type: open-access

Affiliation: Georgia Institute of Technology

Abstract

BackgroundHigh-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes. Accordingly, fast and comprehensive functional gene annotation pipelines are needed to analyze and compare these genomes. Although several approaches exist for genome annotation, these are typically not designed for easy incorporation into analysis pipelines, do not combine results from different annotation databases or offer easy-to-use summaries of metabolic reconstructions, and typically require large amounts of computing power for high-throughput analysis not available to the average user.ResultsHere, we introduce MicrobeAnnotator, a fully automated, easy-to-use pipeline for the comprehensive functional annotation of microbial genomes that combines results from several reference protein databases and returns the matching annotations together with key metadata such as the interlinked identifiers of matching reference proteins from multiple databases [KEGG Orthology (KO), Enzyme Commission (E.C.), Gene Ontology (GO), Pfam, and InterPro]. Further, the functional annotations are summarized into Kyoto Encyclopedia of Genes and Genomes (KEGG) modules as part of a graphical output (heatmap) that allows the user to quickly detect differences among (multiple) query genomes and cluster the genomes based on their metabolic similarity. MicrobeAnnotator is implemented in Python 3 and is freely available under an open-source Artistic License 2.0 from https://github.com/cruizperez/MicrobeAnnotator.ConclusionsWe demonstrated the capabilities of MicrobeAnnotator by annotating 100 Escherichia coli and 78 environmental Candidate Phyla Radiation (CPR) bacterial genomes and comparing the results to those of other popular tools. We showed that the use of multiple annotation databases allows MicrobeAnnotator to recover more annotations per genome compared to faster tools that use reduced databases and is computationally efficient for use in personal computers. The output of MicrobeAnnotator can be easily incorporated into other analysis pipelines while the results of other annotation tools can be seemingly incorporated into MicrobeAnnotator to generate summary plots.

Highlights

High-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes
Computing Requirements of MicrobeAnnotator compared to other tools We compared MicrobeAnnotator to other popular genome annotation pipelines, including Prokaryotic annotation (Prokka) v1.14.6 [8], Server for Rapid Annotations using Subsystem Technology (RAST) [7], Evolutionary genealogy of genes (EggNOG)-mapper v2.0.1b-4-g4c2b55e, InterProScan v5.47-82.0 and Distilled and Refined Annotation of Metabolism (DRAM)
As previously suggested [23], we counted the number of FASTA protein entries, Hidden Markov Models (HMM) models, or website database size reports, depending on the tool

Summary

Results

Computing Requirements of MicrobeAnnotator compared to other tools We compared MicrobeAnnotator to other popular genome annotation pipelines, including Prokka v1.14.6 [8], RAST [7], EggNOG-mapper v2.0.1b-4-g4c2b55e, InterProScan v5.47-82.0 and DRAM. The EggNOG-Mapper annotations were closer to MicrobeAnnotator and DRAM annotations, while RAST and Prokka were the most distinct Upon closer inspection, this result is due to differences in identifying proteins involved in several metabolic pathways (Additional file 2: Table S10). A quick test using the original MicrobeAnnotator summaries complemented with translated E.C. identifiers ( extracted from the original MicrobeAnnotator annotations) showed a similar pattern to that observed for Prokka and RAST (Additional file 2: Tables S10–S11) In this case, the “complemented” version of the original MicrobeAnnotator summary found 37 modules that were 20% more complete than the original summary. Our comparisons showed that there is a similar pattern in terms of computing time with the E. coli dataset mentioned above, with MicrobeAnnotator and DRAM (with UniRef90) requiring longer times to complete the annotations (~ 1.8 h and ~ 3.2 h per genome, respectively), followed by EggNOG-Mapper (~ 1.1 h; Fig. 3a). The use of additional identifiers from other databases could be useful and complementary but should be inspected carefully to avoid false positives

Conclusions

Background

Compile Results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Overview of KEGG applications to omics-related research
Kiyoko F Aoki-Kinoshita
Journal of Pesticide Science | VOL. 31
Kiyoko F Aoki-KinoshitaKiyoko F Aoki-Kinoshita
01 Jan 2006
Journal of Pesticide Science | VOL. 31

Using the KEGG Database Resource
Kiyoko F Aoki ... Minoru Kanehisa
Current Protocols in Bioinformatics | VOL. 11
Kiyoko F Aoki, et. al.Kiyoko F Aoki ... Minoru Kanehisa
01 Sep 2005
Current Protocols in Bioinformatics | VOL. 11

Re-annotation of the sequence > annotation: opportunities for the functional microbiologist.
Francisco Barona‐Gómez
Microbial biotechnology | VOL. 8
Francisco Barona‐GómezFrancisco Barona‐Gómez
01 Jan 2015
Microbial biotechnology | VOL. 8

ProGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes.
Daniel R Mende ... Shinichi Sunagawa
Nucleic Acids Research | VOL. 45
Daniel R Mende, et. al.Daniel R Mende ... Shinichi Sunagawa
24 Oct 2016
Nucleic Acids Research | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics