Abstract

Currently, the automatic lung cancer classification remains a challenging issue for the researchers, due to noisy gene expression data, high dimensional data, and the small sample size. To address these problems, an enhanced gene selection algorithm and multiclass classifier are developed. In this research, the lung cancer-related genes (GEO IDs: GSE10245, GSE19804, GSE7670, GSE10072, and GSE6044) were collected from Gene Expression Omnibus (GEO) dataset. After acquiring the lung cancer-related genes, gene selection was carried out by using enhanced reliefF algorithm for selecting the optimal genes. In enhanced reliefF gene selection algorithm, earthmover distance measure and firefly optimizer were used instead of Manhattan distance measure for identifying the nearest miss and nearest hit instances, which significantly lessens the “curse of dimensionality” issue. These optimal genes were given as the input for Multiclass Support Vector Machine (MSVM) classifier for classifying the sub-classes of lung cancer. The experimental section showed that the proposed system improved the classification accuracy up to 3-10% related to the existing systems in light of accuracy, False Positive Rate (FPR), error rate, and True Positive Rate (TPR).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call