Abstract

The recent outbreak of a COVID-19 pandemic infecting millions of people has sparked a global health emergency. It was named SARS-COV2 because of its generic similarity to the existing SARS-COV of the 2003 coronavirus pandemic. Scientists work to find the effective vaccine for its prevention and, on the other hand, work on its origin, mutation, and evolution to determine its severity from area to area is also under great concern. In this work, focused on the second part, proposes the identification of protein strains associated with the locality in Saudi Arabia as opposed to strains associated with China. For this purpose, the data of both countries were collected from the UniProt database. The identification was carried out by using machine learning, in which the biological data was transformed to numerical form by a hybrid approach using Amino Acid Composition (AAC) and Statistical Moment, and further classifications are carried out using Random Forest and SVM. For the model evaluation, 10-Fold validation and the Jackknife test were applied and achieved high accuracy. These results of high predictions reveal the noticeable variation in both the set of sequences, which shows diversification between coronavirus strains of both the countries Saudi Arabia and China which are the clear indication of sequence mutation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call