Abstract

With the fast pace in collating big data healthcare framework and accurate prediction in detection of lung cancer at early stages, machine learning gives the best of both worlds. In this paper, a streamlining of machine learning algorithms together with apache spark designs an architecture for effective classification of images and stages of lung cancer to the greatest extent. We experiment on a combination of binary classification (SVM-non linear SVM with Radial Basis Function RBF) and Multi-class classification (WTA-SVM winner-takes-all with support vector machine) with threshold technique (T-BMSVM) to classify nodules into malignant or benign nodules and also their malignancy levels respectively. The dataset used for processing is sputum cell images that have been collected from microscope lab images. We have argued for handling and processing large sizes of data sets as sputum cell images in the field of classification using the map-reduce framework in MATLAB and Pyspark, which works better with Apache spark. Our approach outperforms the other methods by achieving stability even in increasing dataset size in leaps and bounds and with a minimum error rate. It achieves 86% accuracy and other metrics are AUC-0.88, misclassification rate through which it was proved that Support Vector Machine (SVM) outperforms other classifiers. These outsourced outcomes reveal that extracting properties of features extracted from the lung cancer images successfully and SVM combined with binary classification, even classification works better with Multi-class rather than SVM, therefore, may be considered as a promising tool to diagnose the stages of nodules and classify the severity of cancer. Also, Scalability and convergence analysis embed to prove the improving results of multi-class classification than SVM.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.