Classification of Gene Expression Data Using Feature Selection Based on Type Combination Approach Model With Advanced Feature Selection Technology

Siddesh G M Siddesh G M,Gururaj T Gururaj T

doi:10.4018/ijcini.20211001.oa46

Abstract

A key step in addressing the classification issue was the selection of genes for removing redundant and irrelevant genes. The proposed Type Combination Approach –Feature Selection(TCA-FS) model uses the efficient feature selection methods, and the classification accuracy can be enhanced. The three classifiers such as K Nearest Neighbour(KNN), Support Vector Machine(SVM) and Random Forest(RF) are selected for evaluating the opted feature selection methods, and prediction accuracy. The effects of three new approaches for feature selection are Improved Recursive Feature Elimination (IRFE), Revised Maximum Information co-efficient (RMIC), as well as Upgraded Masked Painter (UMP), are analysed. These three proposed techniques are compared with existing techniques and are validated with (i) Stability determination test. (ii) Classification accuracy. (iii) Error rates of three proposed techniques are analysed. Due to the selection of proper threshold on classification, the proposed TCA-FS method provides a higher accuracy compared to the existing system.

Highlights

There are some issues in the gene expression data
More precise results than other current approaches have been achieved through suggested improved recursive feature elimination (IRFE), revised maximum information co-efficient (RMIC) and upgraded masked painter (UMP) methods
Too many functions lead to data redundancy and data noise due to the correlation between different functions, which reduces the performance of the classification

Summary

Introduction

There are some issues in the gene expression data. For example by selecting the best extraction method and by reducing the dimensionality of the data. The efficient dimension reduction technique needs to be chosen to reduce the number of non-relevant features present in the dataset. Gene selection is an important factor in removing essential elements which improve precision (Lamba et al, 2018). The irrelevant characteristics and noisy data of the gene expression dataset are present. The statistical approaches are the optimal solution to such a problem. Automatic statistical computation is required to avoid the errors caused during manual calculations. Such problems can be addressed using the learning methods of the machine

Methods

Results

Conclusion