Abstract

A large gap remains between sequencing a microbial community and characterizing all of the organisms inside of it. Here we develop a novel method to taxonomically bin metagenomic assemblies through alignment of contigs against a reference database. We show that this workflow, BugSplit, bins metagenome-assembled contigs to species with a 33% absolute improvement in F1-score when compared to alternative tools. We perform nanopore mNGS on patients with COVID-19, and using a reference database predating COVID-19, demonstrate that BugSplit’s taxonomic binning enables sensitive and specific detection of a novel coronavirus not possible with other approaches. When applied to nanopore mNGS data from cases of Klebsiella pneumoniae and Neisseria gonorrhoeae infection, BugSplit’s taxonomic binning accurately separates pathogen sequences from those of the host and microbiota, and unlocks the possibility of sequence typing, in silico serotyping, and antimicrobial resistance prediction of each organism within a sample. BugSplit is available at https://bugseq.com/academic.

Highlights

  • A large gap remains between sequencing a microbial community and characterizing all of the organisms inside of it

  • We first evaluate BugSplit using three commonly used benchmarking datasets generated with third-generation sequencers: the ZymoBIOMICS Even[21] and Log[22] datasets are mock microbial communities of eight bacteria and two yeasts, with varying abundance, sequenced on an Oxford Nanopore Technologies (ONT) GridION23, and the ZymoBIOMICS Gut Microbiome Standard[24] containing 19 bacteria and two yeasts sequenced on a Pacific Biosciences (PacBio) Sequel II

  • We demonstrate that nucleotide alignment of contigs against a reference database enables significantly improved taxonomic binning of metagenomic assemblies when compared to tools that rely on protein alignments

Read more

Summary

Introduction

A large gap remains between sequencing a microbial community and characterizing all of the organisms inside of it. We develop a novel method to taxonomically bin metagenomic assemblies through alignment of contigs against a reference database. We show that this workflow, BugSplit, bins metagenome-assembled contigs to species with a 33% absolute improvement in F1-score when compared to alternative tools. Earlier work on taxonomic binning has relied on amino acid alignments of assembled contigs to a universal protein database[6,7,8,9] These workflows allow for identification of divergent sequences, but do not leverage the non-coding and synonymous variation within contigs, nor the positional relationship of classifier features (e.g., co-localization of proteins) into taxonomic classification. K-mer and minimizer-based approaches suffer from lack of positional relationship between k-mers, and lack of ability to resolve uncertainty when using a single k-mer or even base to break lowest common ancestor ties, as demonstrated in previous evaluations[9]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.