Feature selection by replicate reproducibility and non-redundancy.

Tümay Capraz,Wolfgang Huber

doi:10.1093/bioinformatics/btae548

Tümay Capraz, Wolfgang Huber

Open Access

https://doi.org/10.1093/bioinformatics/btae548

Copy DOI

Export

Save

Cite

Journal: Bioinformatics (Oxford, England)	Publication Date: Sep 2, 2024
Citations: 1	License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

A fundamental step in many analyses of high-dimensional data is dimension reduction. Two basic approaches are introduction of new synthetic coordinates and selection of extant features. Advantages of the latter include interpretability, simplicity, transferability, and modularity. A common criterion for unsupervized feature selection is variance or dynamic range. However, in practice, it can occur that high-variance features are noisy, that important features have low variance, or that variances are simply not comparable across features because they are measured in unrelated numeric scales or physical units. Moreover, users may want to include measures of signal-to-noise ratio and non-redundancy into feature selection. Here, we introduce the RNR algorithm, which selects features based on (i) the reproducibility of their signal across replicates and (ii) their non-redundancy, measured by linear dependence. It takes as input a typically large set of features measured on a collection of objects with two or more replicates per object. It returns an ordered list of features, i1,i2,…,ik, where feature i1 is the one with the highest reproducibility across replicates, i2 that with the highest reproducibility across replicates after projecting out the dimension spanned by i1, and so on. Applications to microscopy-based imaging of cells and proteomics highlight benefits of the approach. The RNR method is available via Bioconductor (Huber W, Carey VJ, Gentleman R et al. (Orchestrating high-throughput genomic analysis with bioconductor. Nat Methods 2015;12:115-21.) in the R package FeatSeekR. Its source code is also available at https://github.com/tcapraz/FeatSeekR under the GPL-3 open source license.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Feature selection by replicate reproducibility and non-redundancy.

Abstract

Published Version

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)

Lead the way for us

Similar Papers

Various dimension reduction techniques for high dimensional data analysis: a review
Papia Ray ... Tuhina Banerjee
Artificial Intelligence Review | VOL. 54
Papia Ray, et. al.Papia Ray ... Tuhina Banerjee
08 Jan 2021
Artificial Intelligence Review | VOL. 54

New criteria for wrapper feature selection to enhance bearing fault classification
Mohammed Amine Sahraoui ... Toufik Bettahar
Advances in Mechanical Engineering | VOL. 15
Mohammed Amine Sahraoui, et. al.Mohammed Amine Sahraoui ... Toufik Bettahar
01 Jun 2023
Advances in Mechanical Engineering | VOL. 15

Feature Selection for Mutlti-labeled Variables via Dependency Maximization
Salimeh Yasaei Sekeh ... Alfred O Hero
-
Salimeh Yasaei Sekeh, et. al.Salimeh Yasaei Sekeh ... Alfred O Hero
01 May 2019
01 May 2019

Application of non parametric Bayesian methods in high dimensional data
Yunqing Xia
Journal of Computational Methods in Sciences and Engineering | VOL. 24
Yunqing XiaYunqing Xia
10 May 2024
Journal of Computational Methods in Sciences and Engineering | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Feature selection by replicate reproducibility and non-redundancy.

Abstract

Published Version

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)