Abstract
As the main regulator of microbial community composition, bacteriophages exist widely on Earth. However, since they are hidden in metagenomes, most of them are unknown. To identify phages from metagenomes more effectively, a new tool named VFM (Virus Finding & Mining) is presented in this paper. VFM has two versions, i.e., bin-VFM and unbin-VFM. Eighteen new features describing the codon usage bias, the proportion of hits of clusters of orthologous groups of proteins (COG), and 1-mer and 2-mer frequency are introduced to improve the performance of the classifiers. By using missing value interpolation, bin-VFM improves the classification performance for short sequence bins significantly. Compared with previous tools for virus mining, bin-VFM and unbin-VFM perform much better for simulated and real metagenomes with short and long sequences respectively. Thus, VFM may play a helpful role in studies of metagenome-related problems, such as horizontal gene transfer and antibiotic resistance. VFM is freely available at https://github.com/liuql2019/VFM .
Highlights
Viruses are the most abundant and widespread life forms on Earth [1]
THE OVERALL DESIGN OF VFM To achieve better performance for phage mining, eighteen new features related to codon usage bias, clusters of orthologous groups of proteins (COG) gene ratio, and short k-mer frequency (k = 1,2) were used to create a new longer feature vector based on the six features previously reported [38], [41]
Codon usage bias refers to the fact that different species often have distinct synonymous codons in their genes; this can be used as a gene-related marker for identification [42]
Summary
Viruses are the most abundant and widespread life forms on Earth [1]. Their habitats include host bodies [2], such as humans [3]–[5], animals [6]–[8], insects [9], and plants [10], as well as natural environments [11], including marine [12]–[14], freshwater [15], springs [16], soil [17]–[19], and other niches [20], [21]. The identification of phage sequences from a variety of metagenomes plays a crucial role in metagenomic research
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.