Abstract

As prevalent extrachromosomal replicons in many bacteria, plasmids play an essential role in their hosts' evolution and adaptation. The host range of a plasmid refers to the taxonomic range of bacteria in which it can replicate and thrive. Understanding host ranges of plasmids sheds light on studying the roles of plasmids in bacterial evolution and adaptation. Metagenomic sequencing has become a major means to obtain new plasmids and derive their hosts. However, host prediction for assembled plasmid contigs still needs to tackle several challenges: different sequence compositions and copy numbers between plasmids and the hosts, high diversity in plasmids, and limited plasmid annotations. Existing tools have not yet achieved an ideal tradeoff between sensitivity and precision on metagenomic assembled contigs. In this work, we construct a hierarchical classification tool named HOTSPOT, whose backbone is a phylogenetic tree of the bacterial hosts from phylum to species. By incorporating the state-of-the-art language model, Transformer, in each node's taxon classifier, the top-down tree search achieves an accurate host taxonomy prediction for the input plasmid contigs. We rigorously tested HOTSPOT on multiple datasets, including RefSeq complete plasmids, artificial contigs, simulated metagenomic data, mock metagenomic data, the Hi-C dataset, and the CAMI2 marine dataset. All experiments show that HOTSPOT outperforms other popular methods. The source code of HOTSPOT is available via: https://github.com/Orin-beep/HOTSPOT.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.