Abstract

Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, a new representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA or protein) and the species from which the sequences were derived for generating the word vector all influence the performance of models in predicting the host of influenza viruses. In nearly all cases, the models built on the surface proteins hemagglutinin (HA) and neuraminidase (NA) (or their genes) produced better results than internal influenza proteins (or their genes). The best performance was achieved when the model was built on the HA gene based on word vectors (words of three-letters long) generated from DNA sequences of the influenza virus. This results in accuracies of 99.7% for avian, 96.9% for human and 90.6% for swine influenza viruses. Compared to the method of sequence homology best-hit searches using the Basic Local Alignment Search Tool (BLAST), the word vector-based models still need further improvements in predicting the host of influenza A viruses.

Highlights

  • The influenza virus is a negative-sense, single-stranded, segmented RNA virus

  • Predict the host of influenza viruses based on word vectors derived from influenza protein dataset

  • We firstly attempted to predict the host of influenza A viruses based on word vectors derived from influenza protein dataset

Read more

Summary

Introduction

The influenza virus is a negative-sense, single-stranded, segmented RNA virus. Its genome is composed of eight segments and mainly encodes twelve proteins, including two surface proteins HA and NA, and ten internal proteins PB2, PB1, PA, NP, M1, M2, NS1, NS2, PA-X and PB1-F2. Influenza viruses could be mainly separated into types A, B and C, while type A could be further separated into subtypes according to the HA and NA proteins, such as H3N2, H1N1, H5N1, and so on (Taubenberger & Kash, 2010). Type B and C influenza viruses mainly infect humans, whereas type A can infect a wide range of species, such as birds (poultry) and mammals (pigs, bats) including humans (Webster et al, 1992). Avian, human and swine influenza viruses are most commonly observed, and cause large health and economic loss to human society. Human infections by what were considered typical avian and swine strains have become more common, for instance

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call