Abstract

Machine Learning algorithms, such as Support Vector Machines or Deep learning, can help in the task of diagnosing diseases. An evaluation of classification algorithms to classify among four lung cancer types and healthy control from DNA microarray data is presented. Microarray data were collected from the Gene Expression Omnibus database. A comparison has been made between five well-known and widely used machine learning algorithms for classification with and without using the Synthetic Minority Oversampling Technique (SMOTE) for data oversampling. Principal Component Analysis (PCA) was applied to reduce the number of features in the microarray data. The Machine Learning algorithms for classification were tested for different numbers of PCA-based features in terms of their F1 score. The use of SMOTE oversampling was found to improve the overall performance by approximately 1%. The best performing algorithms were the Support Vector Machines and the Deep Neural Networks with F1 scores 89.43 and 89.03, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.