Comparative Evaluation of Machine Learning Algorithms on Lung Cancer Type Classification from DNA Microarray Data

Ferid Ben Ali,Maria Braoudaki,Sola Adeleke,Iosif Mporas,Doraid Alrifai

doi:10.1109/bia52594.2022.9831234

Abstract

Machine Learning algorithms, such as Support Vector Machines or Deep learning, can help in the task of diagnosing diseases. An evaluation of classification algorithms to classify among four lung cancer types and healthy control from DNA microarray data is presented. Microarray data were collected from the Gene Expression Omnibus database. A comparison has been made between five well-known and widely used machine learning algorithms for classification with and without using the Synthetic Minority Oversampling Technique (SMOTE) for data oversampling. Principal Component Analysis (PCA) was applied to reduce the number of features in the microarray data. The Machine Learning algorithms for classification were tested for different numbers of PCA-based features in terms of their F1 score. The use of SMOTE oversampling was found to improve the overall performance by approximately 1%. The best performing algorithms were the Support Vector Machines and the Deep Neural Networks with F1 scores 89.43 and 89.03, respectively.

Full Text