Abstract

Metagenomics and marker gene approaches, coupled with high-throughput sequencing technologies, have revolutionized the field of microbial ecology. Metagenomics is a culture-independent method that allows the identification and characterization of organisms from all kinds of samples. Whole-genome shotgun sequencing analyses the total DNA of a chosen sample to determine the presence of micro-organisms from all domains of life and their genomic content. Importantly, the whole-genome shotgun sequencing approach reveals the genomic diversity present, but can also give insights into the functional potential of the micro-organisms identified. The marker gene approach is based on the sequencing of a specific gene region. It allows one to describe the microbial composition based on the taxonomic groups present in the sample. It is frequently used to analyse the biodiversity of microbial ecosystems. Despite its importance, the analysis of metagenomic sequencing and marker gene data is quite a challenge. Here we review the primary workflows and software used for both approaches and discuss the current challenges in the field.

Highlights

  • Metagenomics refers to the application of sequencing techniques to analyse the totality of the genomic material present in a sample [1]

  • The principal marker genes used in microbial ecology are the 16S rRNA gene [2], the internal transcribed spacer (ITS) region [3] and the 18S rRNA [4]

  • We focus on the analysis of Illumina platform-­ derived data, since this sequencing technology is most commonly used in metagenomic studies

Read more

Summary

INTRODUCTION

Metagenomics refers to the application of sequencing techniques to analyse the totality of the genomic material present in a sample [1]. A more recent comparison between FragGeneScan, MetaGeneAnnotator, MetaGeneMark, Orphelia and Prodigal found that FragGeneScan is better for calling genes in error-­containing fragments, while Prodigal, MetaGeneAnnotator and MetaGeneMark are better suited for higher-q­ uality sequences, such as assembled contigs [113] Despite these comparative studies, the most currently used strategy and probably the best one to identify protein-c­ oding genes uses a combination of different gene-­calling tools, e.g. the JGI annotation pipeline [114], which uses GeneMark.hmm, MetaGeneAnnotator, Prodigal and FragGeneScan. These data have the potential to reveal which organism encodes these functions, this question is generally answered using marker gene profiling, an approach that will be described extensively later in this review. Calculation of meaningful metrics for alpha and beta diversity (Box 1) analyses can be performed with different software, including QIIME 2 [188], mothur [198], USEARCH [159] and the R software packages: phyloseq [247], microbiome [248] or vegan [249]

A GLANCE AT THE MAIN STATISTICAL
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call