Abstract

Methods Our analysis and machine learning algorithm is based on most cited two clinical datasets from the literature: one from San Raffaele Hospital Milan Italia and the other from Hospital Israelita Albert Einstein São Paulo Brasilia. The datasets were processed to select the best features that most influence the target, and it turned out that almost all of them are blood parameters. EDA (Exploratory Data Analysis) methods were applied to the datasets, and a comparative study of supervised machine learning models was done, after which the support vector machine (SVM) was selected as the one with the best performance. Results SVM being the best performant is used as our proposed supervised machine learning algorithm. An accuracy of 99.29%, sensitivity of 92.79%, and specificity of 100% were obtained with the dataset from Kaggle (https://www.kaggle.com/einsteindata4u/covid19) after applying optimization to SVM. The same procedure and work were performed with the dataset taken from San Raffaele Hospital (https://zenodo.org/record/3886927#.YIluB5AzbMV). Once more, the SVM presented the best performance among other machine learning algorithms, and 92.86%, 93.55%, and 90.91% for accuracy, sensitivity, and specificity, respectively, were obtained. Conclusion The obtained results, when compared with others from the literature based on these same datasets, are superior, leading us to conclude that our proposed solution is reliable for the COVID-19 diagnosis.

Highlights

  • Introduction e novel coronavirus known asSARS-CoV-2 (Severe Acute Respiratory Syndrome), responsible for COVID-19 pandemic, belongs to the large family of coronaviruses that cause fever, cough, dyspnea, and muscle pain, while imaging frequently reveals bilateral pneumonia [1,2,3]

  • Due to the constant shortage of PCR test reagents, which are the tests for COVID-19 by excellence, several medical centers have opted for immunological tests to look for the presence of antibodies produced against this virus

  • We proposed a solution based on Data Analysis and Machine Learning to detect COVID-19 infections

Read more

Summary

Related Works

Several works based on AI, along with ML and DL, have been carried out over the last two years in the context of diagnosis and detection of COVID-19 infections. In 2021, AlJame et al [31] used routine blood tests and proposed an ensemble learning model for COVID-19 diagnosis. For data preparation, they exploited a K-Nearest Neighbors algorithm to deal with null values in the dataset and an isolation forest method to remove outlier data. By using random forest (RF) as their best ML algorithm, they achieved a good result (accuracy 0.88, F1–score 0.76, sensitivity 0.66, specificity 0.91, and AUROC 0.86). Ey found that COVID-19 patients can be divided into subtypes based on the serum levels of immune cells, gender, and reported symptoms They trained an XGBoost model that can distinguish COVID-19 patients from influenza patients with a sensitivity of 92.5% and a specificity of 97.9%. We optimize the SVM algorithm to have a performance superior to all algorithms found in the literature using the same datasets

Proposed Approach
Exploratory Data Analysis
Evaluation
Results
Optimization Results of the Best Model
Discussions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.