Abstract

BackgroundThe development of Next Generation Sequencing (NGS) has had a major impact on the study of genetic sequences. Among problems that researchers in the field have to face, one of the most challenging is the taxonomic classification of metagenomic reads, i.e., identifying the microorganisms that are present in a sample collected directly from the environment. The analysis of environmental samples (metagenomes) are particularly important to figure out the microbial composition of different ecosystems and it is used in a wide variety of fields: for instance, metagenomic studies in agriculture can help understanding the interactions between plants and microbes, or in ecology, they can provide valuable insights into the functions of environmental communities.ResultsIn this paper, we describe a new lightweight alignment-free and assembly-free framework for metagenomic classification that compares each unknown sequence in the sample to a collection of known genomes. We take advantage of the combinatorial properties of an extension of the Burrows-Wheeler transform, and we sequentially scan the required data structures, so that we can analyze unknown sequences of large collections using little internal memory. The tool LiME (Lightweight Metagenomics via eBWT) is available at https://github.com/veronicaguerrini/LiME.ConclusionsIn order to assess the reliability of our approach, we run several experiments on NGS data from two simulated metagenomes among those provided in benchmarking analysis and on a real metagenome from the Human Microbiome Project. The experiment results on the simulated data show that LiME is competitive with the widely used taxonomic classifiers. It achieves high levels of precision and specificity – e.g. 99.9% of the positive control reads are correctly assigned and the percentage of classified reads of the negative control is less than 0.01% – while keeping a high sensitivity. On the real metagenome, we show that LiME is able to deliver classification results comparable to that of MagicBlast. Overall, the experiments confirm the effectiveness of our method and its high accuracy even in negative control samples.

Highlights

  • The development of Generation Sequencing (NGS) has had a major impact on the study of genetic sequences

  • The analysis of environmental samples are important to figure out the microbial composition of different ecosystems and it is used in several fields: for example, metagenomic studies in agriculture are being used to explore the relations between microbes and plants and to detect crop diseases

  • We present a tool for the metagenomic classification task, called Lightweight metagenomics via eBWT (LiME) (Lightweight Metagenomics via Extended burrows-wheeler transform (eBWT)), that takes as input the BurrowsWheeler Transform enhanced with the document array (DA) and the longest common prefix (LCP) array [29]

Read more

Summary

Introduction

The development of Generation Sequencing (NGS) has had a major impact on the study of genetic sequences. The analysis of environmental samples (metagenomes) are important to figure out the microbial composition of different ecosystems and it is used in a wide variety of fields: for instance, metagenomic studies in agriculture can help understanding the interactions between plants and microbes, or in ecology, they can provide valuable insights into the functions of environmental communities. The aim is to subdivide the reads or the sequences assembled from metagenomic reads into discrete units, without the need of references, so that sequences clustered together display individual populations that comprise the microbial community. This latter approach is known as reference-free binning [8]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.