Gene Selection and Classification of scRNA-seq Data Combining Information Gain Ratio and Genetic Algorithm with Dynamic Crossover

Junhong Feng,Jian-Hong Wang,Jie Zhang,Xishuan Niu,Chao-Yang Lee

doi:10.1155/2022/9639304

Abstract

Single-cell RNA sequencing (scRNA-seq) is emerging as a promising technology. There exist a huge number of genes in a scRNA-seq data. However, some genes are high quality genes, and some are noises and irrelevant genes because of unspecific technology reasons. These noises and irrelevant genes may have a strong influence on downstream data analyses, such as a cell classification, gene function analysis, and cancer biomarker detection. Therefore, it is very significant to obviate these irrelevant genes and choose high quality genes by gene selection methods. In this study, a novel gene selection and classification method is presented by combining the information gain ratio and the genetic algorithm with dynamic crossover (abbreviated as IGRDCGA). The information gain ratio (IGR) is employed to eliminate irrelevant genes roughly and obtain a preliminary gene subset, and then the genetic algorithm with a dynamic crossover (DCGA) is utilized to choose high quality genes finely from the preliminary gene subset. The main difference between the IGRDCGA and the existing methods is that the DCGA and IGR are integrated first and used to select genes from scRNA-seq data. We conduct the IGRDCGA and several competing methods on some real-world scRNA-seq datasets. The obtained results demonstrate that the IGRDCGA can choose high quality genes effectively and efficiently and outperforms the other several competing methods in terms of both the dimensionality reduction and the classification accuracy.

Highlights

In scRNA-seq data, there often are amounts of genes and may reach tens of thousands
We present a novel algorithm to address the gene selection and classification for scRNA-seq data by combining information gain ratio and genetic algorithm with dynamic crossover (IGRDCGA for short). e coding and the other details of the IGRDCGA are as follows
In order to evaluate the performances of the IGRDCGA, two frequently used clustering algorithms, k means and spectral clustering [57], a state-of-the-art single-cell classification algorithm SIMLR [58], are employed to compare it

Summary

Introduction

In scRNA-seq data, there often are amounts of genes and may reach tens of thousands. Some genes are irrelevant or unsuitable for classification tasks, and they may seriously affect the efficiency of downstream data analysis. In order to obviate these irrelevant genes and select high quality genes, an effective and efficient gene selection algorithm is vital. Eroglu and Kilic [3] integrated a genetic local search algorithm and a k-nearest neighbor classifier to select feature subset. How to correctly use the GA to address the gene selection and classification of scRNA-seq data is a significant issue to consider first. E study integrates the IGR and DCGA to address the gene selection and classification of scRNA-seq data and proposes a novel gene selection and classification algorithm. E IGRDCGA utilizes the IGR to eliminate irrelevant genes roughly and obtain a preliminary gene subset and employs the DCGA to choose high quality genes finely from the preliminary gene subset.

Related Work

Evaluation Metrics

Dataset and Preprocessing

The Proposed Algorithm

Numerical Results

Results

Conclusion and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Wireless Communications and Mobile Computing	Publication Date: Jan 31, 2022
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Gene Selection and Classification of scRNA-seq Data Combining Information Gain Ratio and Genetic Algorithm with Dynamic Crossover

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing

Lead the way for us

Similar Papers

RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest.
Yuan Zhao ... Cui-Xiang Lin
Frontiers in Genetics | VOL. 12
Yuan Zhao, et. al.Yuan Zhao ... Cui-Xiang Lin
27 Jul 2021
Frontiers in Genetics | VOL. 12

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
...
-
, et. al. ...
01 Dec 2017
01 Dec 2017

An optimal structure for ensemble feature selection
...
-
, et. al. ...
01 Dec 2020
01 Dec 2020

Matched Gene Selection and Committee Classifier for Molecular Classification of Heterogeneous Diseases
...
Journal of machine learning research : JMLR | VOL. 11
, et. al. ...
01 Mar 2010
Journal of machine learning research : JMLR | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gene Selection and Classification of scRNA-seq Data Combining Information Gain Ratio and Genetic Algorithm with Dynamic Crossover

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing