Abstract

BackgroundA prime objective in metagenomics is to classify DNA sequence fragments into taxonomic units. It usually requires several stages: read’s quality control, de novo assembly, contig annotation, gene prediction, etc. These stages need very efficient programs because of the number of reads from the projects. Furthermore, the complexity of metagenomes requires efficient and automatic tools that orchestrate the different stages.MethodDATMA is a pipeline for fast metagenomic analysis that orchestrates the following: sequencing quality control, 16S rRNA-identification, reads binning, de novo assembly and evaluation, gene prediction, and taxonomic annotation. Its distributed computing model can use multiple computing resources to reduce the analysis time.ResultsWe used a controlled experiment to show DATMA functionality. Two pre-annotated metagenomes to compare its accuracy and speed against other metagenomic frameworks. Then, with DATMA we recovered a draft genome of a novel Anaerolineaceae from a biosolid metagenome.ConclusionsDATMA is a bioinformatics tool that automatically analyzes complex metagenomes. It is faster than similar tools and, in some cases, it can extract genomes that the other tools do not. DATMA is freely available at https://github.com/andvides/DATMA.

Highlights

  • The world is dominated by microorganisms that, we cannot see, are an essential part of all biomes on Earth

  • Even though metagenomics allows studying a community without the need of cultivating the species, these datasets contain a mix of the sequences from all organisms in the sample, and it is very challenging to know the origin of each read

  • We show that using an FM-index structure, to represent the dataset, it is possible to reduce the computational complexity by allowing to match each read against the whole structure without the need of comparing all possible read pairs

Read more

Summary

Introduction

The world is dominated by microorganisms that, we cannot see, are an essential part of all biomes on Earth. Next-Generation Sequencing (NGS) platforms can sequence DNA from environmental samples without the need for isolating the species. These experiments are called metagenomics, and they allow the study of microorganisms without the need for prior cultivation. Deoxyribonucleic acid (DNA) is the molecule that contains the instructions for the functions and development of all the cells of living organisms. It is formed by the union of nucleotides, which are composed of a monosaccharide sugar, a phosphate group and a nitrogen base that can be guanine (G), adenine (A), thymine (T), or cytosine (C).

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.