Abstract

Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top substitution matrices. We applied JRFR to human H3N2 seasonal influenza data from 1968 to 2003. A 10-fold cross-validation shows that JRFR outperforms other popular methods in predicting antigenic variants. In addition, our results suggest that structure features are most relevant to influenza antigenicity. By restricting the analysis to data involving two adjacent antigenic clusters, we inferred a few key amino acids mutation driving the 11 historical antigenic drift events, pointing to experimentally validated mutations. Finally, we constructed an antigenic cartography of all H3N2 viruses with hemagglutinin (the glycoprotein on the surface of the influenza virus responsible for its binding to host cells) sequence available from NCBI flu database, and showed an overall correspondence and local inconsistency between genetic and antigenic evolution of H3N2 influenza viruses.

Highlights

  • Causing an estimated 500,000 deaths worldwide per year, influenza epidemics in humans seriously endanger population health and world economy[1]

  • One of the most popular assays to evaluate the efficacy of a vaccine against an influenza virus is the hemagglutination inhibition (HI) assay, a binding assay measuring the ability of antisera to block the hemagglutinin (HA) of the antigen from agglutinating red blood cells[6]

  • We propose and test Joint Random Forest Regression (JRFR), a novel algorithm that combines multiple substitution matrices into the random forest algorithm to predict antigenic distances from HA1 protein sequences

Read more

Summary

Introduction

Causing an estimated 500,000 deaths worldwide per year, influenza epidemics in humans seriously endanger population health and world economy[1]. Liao et al tested four algorithms including iterative filtering, multiple regression, logistic regression, and support vector machine to predict antigenic variants from mutations in HA1, a sub-unit of HA forming globular domain[11]. They explored six amino acids substitution models based on physiochemical grouping of 20 amino acids[10]. The predictive powers of the 94 physicochemical and biochemical properties of amino acids in AAindex could be helpful to elucidate their contribution to influenza antigenic evolution. There might be some advantages in predicting continuous antigenic distances using nonlinear models like random forest[19] since antigenic distances have higher resolution than binary values and the relationship among antigenic sites might be nonlinear

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.