Abstract

Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptive IAV nucleotide composition. A total of 217,549 IAV full-length coding sequences of the PB2 (polymerase basic protein-2), PB1, PA (polymerase acidic protein), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). A total of 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13, 10 and 9 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic curve indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptation-associated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic data sets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza.

Highlights

  • Type A influenza viruses (IAVs) infect a wide range of avian and mammalian hosts, generally with species specificity

  • Prediction Pipeline and Data Processing of the Genomic Nucleotide Composition in influenza A viruses (IAVs) As the workflow diagram in figure 1A shows, data wrangling was performed for IAV open reading frame (ORF) sequences

  • Twelve types of mononucleotides and 48 types of dinucleotides in the ORF were counted for all the sequence samples

Read more

Summary

Introduction

Type A influenza viruses (IAVs) infect a wide range of avian and mammalian hosts, generally with species specificity. Avian influenza viruses (AIVs) typically exist in natural reservoirs, waterfowl, and shorebirds (Yoon et al 2014), which mostly cause subclinical bird infection (Webster et al 1978; Long et al 2019). AIVs sporadically infect mammalian hosts, such as swine (Pensaert et al 1981), human beings (Subbarao and Katz 2000; de Jong et al 2006; Lam et al 2013), and other mammals (White 2013; Lee et al 2017) and are incapable of intraspecies transmission (Tran et al 2004; Maines et al 2006; Long et al 2019). It is of great importance to predict the adaptation of avian or swine IAVs to humans

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.