Abstract

Abstract The coronavirus pandemic has seriously affected public health and social order. Prediction methods based on machine learning can identify the infectivity phenotype and pandemic risk of coronavirus. Currently, six types of coronaviruses that infect humans have been discovered, with significant differences in viral genome sequences. Continuous genetic variation of the virus will lead to reduced performance of machine learning models and potential learning forgetting. To solve this challenge, we propose an incremental learning and knowledge distillation framework (ILKD). First, we employ Dna2Vec to extract virus features and encode the virus sequence into virus feature vector. Second, we use hierarchical clustering to continuously identify new coronavirus groups. Third, ILKD employ a combined strategy of incremental learning and knowledge distillation to transform the Back Propagation (BP) neural network to continuously learn and predict the phenotypes of human-to-human coronavirus infection. Experimental results show that ILKD can effectively alleviate the learning forgetting phenomenon. Further analysis reveals ILKD has better performance than other incremental learning models, and has important public health application value.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.