Abstract

Disease diagnosis and prediction methods in biotechnology and medicine have significantly advanced over time. Consequently, analyzing raw gene expression is crucial for identifying diseases such as cancer. Interestingly, microarrays are a tool that records gene expression from deoxyribonucleic acid (DNA) or ribonucleic acid. This technique exhibits intriguing characteristics, such as generating high-dimensional data with a small sample size. However, in the case of such dataset, the classification model is prone to overfitting. This limitation can be overcome by reducing the dimensions of the microarray datasets to a reasonable number. Machine learning (ML)-based data reduction has recently achieved considerable attention in genomic research. Therefore, this review examines recent studies that present state-of-the-art data reduction and classification algorithms for microarray gene expression data to diagnose tumors and analyzes their performance. To the best of our knowledge, this is the first review that provides a comprehensive view of data preprocessing, dimensionality reduction, including feature (i.e., gene) selection, feature extraction, and their hybrid, and ML algorithms. The paper is structured as follows. First, this review summarizes several data preprocessing methods applied to gene expression datasets. Then, a detailed review of various ML-based feature selection algorithms, including filter, wrapper, embedded, ensemble, and hybrid algorithms, is discussed. These algorithms are examined under three main classes—supervised, unsupervised, and semisupervised ML. Next, the feature extraction and hybrid of feature extraction and selection algorithms are thoroughly reviewed. Furthermore, a detailed review of broadly applied ML algorithms to simplify tumor and nontumor classification using microarray datasets is presented. Finally, the challenges and open questions related to gene expression datasets for accurate cancer classification and detection are highlighted.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call