Characterization of SARS-CoV-2 cases in Mexico using data mining

Enrique Luna-RamíRez,Edgar Aurelio Taya-Acosta,Apolinar Velarde-Martínez,Jorge Soria-Cruz

doi:10.35429/jca.2020.15.4.19.25

Abstract

In this paper, it is realized an analysis of the data published by the Federal Government of Mexico on the cases related to the test for detecting the presence of the SARS-CoV-2 virus, that originates the COVID-19 disease. More than a million cases were analyzed, most of which were positive to the test. For this study, twenty-one significant variables were considered, included the result of the test and the cases of death, going through the different factors that complicate a person’s health such as diabetes, chronic obstructive pulmonary disease (COPD), asthma, hypertension, obesity and smoking, among others. At the beginning of the study, the preparation of the data was carried out so that they could be treated using data mining techniques, based on the CRISP-DM methodology for extraction of knowledge. Thus, with the help of this type of techniques, data models were generated to characterize the development of the COVID-19 disease in the national and local (by States) panorama. As an important part of the models, various rules or correlations were observed among the different variables, which could be used to predict, in part, the future development of the COVID-19 disease in Mexico and, consequently, to establish best practices that target to reduce its social impact.

Full Text