Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique.

Yuanting Yan,Yiwen Zhang,Yanping Zhang,Meili Yang,Tao Dai,Xiuquan Du,Xiuquan Du

doi:10.3390/ijms19113398

Yuanting Yan, Yiwen Zhang + Show 5 more

Open Access

https://doi.org/10.3390/ijms19113398

Copy DOI

Journal: International Journal of Molecular Sciences	Publication Date: Oct 30, 2018
Citations: 2	License type: CC BY 4.0

Affiliation: Anhui University

Abstract

(1) Background: Gene-expression data usually contain missing values (MVs). Numerous methods focused on how to estimate MVs have been proposed in the past few years. Recent studies show that those imputation algorithms made little difference in classification. Thus, some scholars believe that how to select the informative genes for downstream classification is more important than how to impute MVs. However, most feature-selection (FS) algorithms need beforehand imputation, and the impact of beforehand MV imputation on downstream FS performance is seldom considered. (2) Method: A modified chi-square test-based FS is introduced for gene-expression data. To deal with the challenge of a small sample size of gene-expression data, a heuristic method called recursive element aggregation is proposed in this study. Our approach can directly handle incomplete data without any imputation methods or missing-data assumptions. The most informative genes can be selected through a threshold. After that, the best-first search strategy is utilized to find optimal feature subsets for classification. (3) Results: We compare our method with several FS algorithms. Evaluation is performed on twelve original incomplete cancer gene-expression datasets. We demonstrate that MV imputation on an incomplete dataset impacts subsequent FS in terms of classification tasks. Through directly conducting FS on incomplete data, our method can avoid potential disturbances on subsequent FS procedures caused by MV imputation. An experiment on small, round blue cell tumor (SRBCT) dataset showed that our method found additional genes besides many common genes with the two compared existing methods.

Highlights

As an important technology in the field of bioinformatics, microarray technology is prominent do to its ability to potentially simultaneously measure thousands of gene-expression levels [1,2]
Gene-expression data are important data obtained from microarray experiments
For real data, missing values (MVs) imputation has minor impact on downstream classification tasks, but MV imputation is based on the MAR assumption; the impact of MV imputation on subsequent FS is seldom considered

Summary

Introduction

As an important technology in the field of bioinformatics, microarray technology is prominent do to its ability to potentially simultaneously measure thousands of gene-expression levels [1,2]. Gene-expression data obtained from microarray experiments are usually confronted with high-dimension and missing-data problems [3,4]. This characteristic generates two problems for downstream gene-expression data analysis (e.g., classification). MVs present a challenge to traditional analysis models that require a complete data matrix [5,6]. Another problem is the high computational complexity caused by data’s high dimensionality [7,8]

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences

Lead the way for us

Similar Papers

On mining incomplete medical datasets: Ordering imputation and classification.
Chih-Wen Chen ... Ya-Han Hu
Technology and health care : official journal of the European Society for Engineering and Medicine | VOL. 23
Chih-Wen Chen, et. al.Chih-Wen Chen ... Ya-Han Hu
22 Sep 2015
Technology and health care : official journal of the European Society for Engineering and Medicine | VOL. 23

The Feature Selection Effect on Missing Value Imputation of Medical Datasets
Chia-Hui Liu ... Chih-Fong Tsai
Applied Sciences | VOL. 10
Chia-Hui Liu, et. al.Chia-Hui Liu ... Chih-Fong Tsai
29 Mar 2020
Applied Sciences | VOL. 10

Multi-Round Random Subspace Feature Selection for Incomplete Gene Expression Data
Will Pearson ... Cao Truong Tran
-
Will Pearson, et. al.Will Pearson ... Cao Truong Tran
01 Jun 2019
01 Jun 2019

Data preprocessing issues for incomplete medical datasets
Min‐Wei Huang ... Chih‐Fong Tsai
Expert Systems | VOL. 33
Min‐Wei Huang, et. al.Min‐Wei Huang ... Chih‐Fong Tsai
09 Jun 2016
Expert Systems | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences