Abstract

With the modernization of machine learning techniques in healthcare, different innovations including support vector machine (SVM) have predominantly played a major role in classifying lung cancer, predicting coronavirus disease 2019, and other diseases. In particular, our algorithm focuses on integrated datasets as compared with other existing works. In this study, parallel-based SVM (P-SVM) and multiclass-based multiple submodels (MMSM-SVM) were used to analyze the optimal classification of lung diseases. This analysis aimed to find the optimal classification of lung diseases with id and stages, such as key-value pairs in MapReduce combined with P-SVM and MMSVM for binary and multiclasses, respectively. For non-linear classification, kernel clustering-based SVM embedded with multiple submodels was developed. Both algorithms were developed using Apache spark environment, and data for the analysis were retrieved from microscope lab, UCI, Kaggle, and General Thoracic surgery database along with some electronic health records related to various lung diseases to increase the dataset size to 5 GB. Performance measures were conducted using a 5 GB dataset with five nodes. Dataset size was finally increased, and task analysis and CPU utilization were measured.

Highlights

  • Big data plays a vital role in analyzing extremely large data sets with reduced complexity and efficient analysis

  • Prediction of coronavirus disease 2019 (COVID-19) is mandatory to prevent the risk of spread, and predetermination of lung cancer stages is mandatory to determine lung cells damaged in increasing stages [2]

  • The difference is that P-support vector machine (SVM) classifies well in binary classification

Read more

Summary

Introduction

Big data plays a vital role in analyzing extremely large data sets with reduced complexity and efficient analysis. With enhanced techniques of big data, a large amount of data has been handled in parallel. Data classification has been performed using salient solutions. Data with exponential growth are complex and challenging to classify [1]. Prediction of coronavirus disease 2019 (COVID-19) is mandatory to prevent the risk of spread, and predetermination of lung cancer stages is mandatory to determine lung cells damaged in increasing stages [2]. In medical science, affected parts can be retrieved and used to diagnose early stages of the disease [3]. Biopsy is the initial step in diagnosis; during this process, cells are sampled

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call