Abstract

Non-small cell lung cancer is the most common type of lung cancer. Identification of genes associated with this disease may contribute to the treatment of the disease. Therefore, a lot of work is being done. In some of these studies, genetic data is obtained by microarray analysis and shared publicly in databases such as NCBI Gene Expression Omnibus. In today’s big data era, machine learning algorithms are frequently used to access valuable information from data stacks. Within the scope of this study, all (6 pieces) microarray datasets related to NSCLC and drug resistance in the NCBI GEO database were analyzed by R Studio. With support vector machine, k nearest neighbor, naïve Bayes, random forest, C5.0 decision tree, multilayer perceptron, and artificial neural network algorithms with principal component step, the datasets were analyzed separately and related genes were determined through the caret package, and the top 10 genes for each algorithm were given in the findings section in order of importance. In this resulting gene table, ELOVL7, HMGA2, SAT1, RRM1, IER3, SLC7A11, and U2AF1 genes are included in at least 2 different datasets. These identified genes are recommended to researchers working in a wet laboratory environment to be validated experimentally.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call