ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction

Sehee Wang,Hyun-Hwan Jeong,Kyung-Ah Sohn

doi:10.1186/s12920-019-0512-9

Abstract

BackgroundFeature selection or scoring methods for the detection of biomarkers are essential in bioinformatics. Various feature selection methods have been developed for the detection of biomarkers, and several studies have employed information-theoretic approaches. However, most of these methods generally require a long processing time. In addition, information-theoretic methods discretize continuous features, which is a drawback that can lead to the loss of information.ResultsIn this paper, a novel supervised feature scoring method named ClearF is proposed. The proposed method is suitable for continuous-valued data, which is similar to the principle of feature selection using mutual information, with the added advantage of a reduced computation time. The proposed score calculation is motivated by the association between the reconstruction error and the information-theoretic measurement. Our method is based on class-wise low-dimensional embedding and the resulting reconstruction error. Given multi-class datasets such as a case-control study dataset, low-dimensional embedding is first applied to each class to obtain a compressed representation of the class, and also for the entire dataset. Reconstruction is then performed to calculate the error of each feature and the final score for each feature is defined in terms of the reconstruction errors. The correlation between the information theoretic measurement and the proposed method is demonstrated using a simulation. For performance validation, we compared the classification performance of the proposed method with those of various algorithms on benchmark datasets.ConclusionsThe proposed method showed higher accuracy and lower execution time than the other established methods. Moreover, an experiment was conducted on the TCGA breast cancer dataset, and it was confirmed that the genes with the highest scores were highly associated with subtypes of breast cancer.

Highlights

Feature selection or scoring methods for the detection of biomarkers are essential in bioinformatics
An experiment was conducted on the The cancer genome atlas (TCGA) breast cancer dataset, and it was confirmed that the genes with the highest scores were highly associated with subtypes of breast cancer
The entropy of the multivariate Gaussian distribution can be calculated as follows, using the determinant of the covariance matrix [19]: HðXÞ 1⁄4 n þ n ln 2π þ ln j Σ j 22 where n is the number of features in X and Σ is the determinant of the covariance matrix

Summary

Introduction

Feature selection or scoring methods for the detection of biomarkers are essential in bioinformatics. Various feature selection methods have been developed for the detection of biomarkers, and several studies have employed information-theoretic approaches. Most of these methods generally require a long processing time. The ‘curse of dimensionality’ [7] occurs, in which the number of required samples exponentially increases as the number of features increases. To overcome this drawback, a feature selection method is often applied to the selection of important features. It is important to develop feature selection algorithms for the detection of biomarkers

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC medical genomics	Publication Date: Jul 1, 2019
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC medical genomics

Lead the way for us

Similar Papers

A Multi-Label Feature Selection Based on Mutual Information and Ant Colony Optimization
Mohammad Hatami ... Pooya Mehrmohammadi
-
Mohammad Hatami, et. al.Mohammad Hatami ... Pooya Mehrmohammadi
04 Aug 2020
04 Aug 2020

Feature selection and threshold method based on fuzzy joint mutual information
Omar A.M Salem ... Xi Chen
International Journal of Approximate Reasoning | VOL. 132
Omar A.M Salem, et. al.Omar A.M Salem ... Xi Chen
23 Feb 2021
International Journal of Approximate Reasoning | VOL. 132

Feature selection based on feature interactions with application to text categorization
Xiaochuan Tang ... Yanping Xiang
Expert Systems with Applications | VOL. 120
Xiaochuan Tang, et. al.Xiaochuan Tang ... Yanping Xiang
10 Nov 2018
Expert Systems with Applications | VOL. 120

An isomiR expression panel based novel breast cancer classification approach using improved mutual information
Chaowang Lan ... Eileen M Mcgowan
BMC Medical Genomics | VOL. 11
Chaowang Lan, et. al.Chaowang Lan ... Eileen M Mcgowan
01 Dec 2018
BMC Medical Genomics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC medical genomics