Abstract

Metagenome sequencing provides an unprecedented opportunity for the discovery of unknown microbes and viruses. A large number of phages and prokaryotes are mixed together in metagenomes. To study the influence of phages on human bodies and environments, it is of great significance to isolate phages from metagenomes. However, it is difficult to identify novel phages because of the diversity of their sequences and the frequent presence of short contigs in metagenomes. Here, virSearcher is developed to identify phages from metagenomes by combining the convolutional neural network (CNN) and the gene information of input sequences. Firstly, an input sequence is encoded in accordance with the different functions of its coding and the non-coding regions and then is converted into word embedding code through a word embedding layer before a convolutional layer. Meanwhile, the hit ratio of the virus genes is combined with the output of the CNN to further improve the performance of the network. The genes used by virSearcher consist of complete and incomplete genes. Experiments on several metagenomes have showed that, compared with others, virSearcher can significantly improve the performance for the identification of short sequences, while maintaining the performance for long ones. The source code of virSearcher is freely available from http://github.com/DrJackson18/virSearcher.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call