Robustness analysis of metabolic predictions in algal microbial communities based on different annotation pipelines.

Elham Karimi,Erwan Corre,Enora Geslain,Simon M Dittami,Clémence Frioux,Arnaud Belcour,Méziane Aïte,Anne Siegel

doi:10.7717/peerj.11344

Abstract

Animals, plants, and algae rely on symbiotic microorganisms for their development and functioning. Genome sequencing and genomic analyses of these microorganisms provide opportunities to construct metabolic networks and to analyze the metabolism of the symbiotic communities they constitute. Genome-scale metabolic network reconstructions rest on information gained from genome annotation. As there are multiple annotation pipelines available, the question arises to what extent differences in annotation pipelines impact outcomes of these analyses. Here, we compare five commonly used pipelines (Prokka, MaGe, IMG, DFAST, RAST) from predicted annotation features (coding sequences, Enzyme Commission numbers, hypothetical proteins) to the metabolic network-based analysis of symbiotic communities (biochemical reactions, producible compounds, and selection of minimal complementary bacterial communities). While Prokka and IMG produced the most extensive networks, RAST and DFAST networks produced the fewest false positives and the most connected networks with the fewest dead-end metabolites. Our results underline differences between the outputs of the tested pipelines at all examined levels, with small differences in the draft metabolic networks resulting in the selection of different microbial consortia to expand the metabolic capabilities of the algal host. However, the consortia generated yielded similar predicted producible compounds and could therefore be considered functionally interchangeable. This contrast between selected communities and community functions depending on the annotation pipeline needs to be taken into consideration when interpreting the results of metabolic complementarity analyses. In the future, experimental validation of bioinformatic predictions will likely be crucial to both evaluate and refine the pipelines and needs to be coupled with increased efforts to expand and improve annotations in reference databases.

Highlights

Plants, animals, and algae are hosts to a large diversity of microorganisms
Magnifying Genomes (MaGe) searches for functional features using UniProtKB/Swiss-Prot, Interpro, FIGFAM, COG, ENZYME, and Diamond as a search tool (Vallenet et al, 2019); Rapid Annotations using Subsystems Technology (RAST) predicts gene functions using the SEED database (Aziz et al, 2008); Integrated Microbial Genomes (IMG) predicts features for genes based on COGs, Pfams, TIGRFAMs, as well as the KEGG and MetaCyc; Prokka is a command-line tool using UniProt, Pfam, and TIGRFAMs; DDBJ Fast Annotation and Submission Tool (DFAST) uses ortholog searches with reciprocal BLAST searches, HMM searches against TIGRFAMs, and COG assignments for functional annotations (Tanizawa, Fujisawa & Nakamura, 2017)
To assess how these differences in the draft metabolic networks impacted the function of the predicted metabolism of algal–bacterial holobionts, we examined the list of metabolites that could be produced by the algal metabolic network when combined with the 81 draft bacterial networks for each annotation pipeline

Summary

Introduction

Animals, and algae are hosts to a large diversity of microorganisms The importance of these symbiotic microbes for the development and functioning of their hosts is widely accepted (Amin et al, 2015; Fraune & Bosch, 2010; McFall-Ngai et al, 2013; Philippot et al, 2013). This is true for brown algal surfaces, which provide an attractive substrate for different bacterial phyla, most importantly the Proteobacteria, Bacteroidetes, Firmicutes, and Actinobacteria. For instance, the choice of the bioinformatic pipeline has been shown to have a significant impact on some of the biological conclusions that could be drawn (Siegwald et al, 2019)

Objectives

Methods

Results

Conclusion