Abstract

Gene Expression is the process of determining the physical characteristics of living beings by generating the necessary proteins. Gene Expression takes place in two steps, translation and transcription. It is the flow of information from DNA to RNA with enzymes’ help, and the end product is proteins and other biochemical molecules. Many technologies can capture Gene Expression from the DNA or RNA. One such technique is Microarray DNA. Other than being expensive, the main issue with Microarray DNA is that it generates high-dimensional data with minimal sample size. The issue in handling such a heavyweight dataset is that the learning model will be over-fitted. This problem should be addressed by reducing the dimension of the data source to a considerable amount. In recent years, Machine Learning has gained popularity in the field of genomic studies. In the literature, many Machine Learning-based Gene Selection approaches have been discussed, which were proposed to improve dimensionality reduction precision. This paper does an extensive review of the various works done on Machine Learning-based gene selection in recent years, along with its performance analysis. The study categorizes various feature selection algorithms under Supervised, Unsupervised, and Semi-supervised learning. The works done in recent years to reduce the features for diagnosing tumors are discussed in detail. Furthermore, the performance of several discussed methods in the literature is analyzed. This study also lists out and briefly discusses the open issues in handling the high-dimension and less sample size data.

Highlights

  • Deoxy-ribonucleic Acid (DNA) is a hereditary material containing the genetic information, usually found in the cell’s nucleus

  • Hybrid and Embedded methods are the recent developments in Gene Selection. (a) Filter Feature Selection Approach: Filter helps in identifying the specific abilities of features depending on the inherent properties of the data

  • The examples for filter techniques are Pearson Correlation, Fisher Score, Model-based Ranking, and Mutual Information (Lazar et al, 2012) were done in a detailed survey on the filter techniques applied to Gene Expression Microarray data

Read more

Summary

Introduction

Deoxy-ribonucleic Acid (DNA) is a hereditary material containing the genetic information, usually found in the cell’s nucleus. Filter-wrapper hybrid approaches are focused on supervised as well as unsupervised gene selection.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call