Feature selection with the R package MXM.

Michail Tsagris,Ioannis Tsamardinos

doi:10.12688/f1000research.16216.2

Abstract

Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM's algorithms using real high-dimensional data from various applications.

Highlights

Given a target variable Y of n measurements and a set X of p features the problem of feature selection (FS) is to identify the minimal set of features with the highest predictabilitya on the target variable of interest
The natural question that arises, is why should researchers and practitioners perform FS. The answer to this is for a variety of reasons[1], such as: a) many features may be expensive to measure, especially in the clinical and medical domains; b) FS may result in more accurate models by removing noise while treating the curse-of-dimensionality; c) the final produced parsimonious models are computationally cheaper and often easier to understand and interpret; d) future experiments can benefit from prior feature selection tasks and provide more insight into the problem of interest, its characteristics and structure. e) FS is indissolubly connected with causal inference that tries to identify the system’s causal mechanism that generated the data
2 (1.08%) R packages treat the case of FS with multiple datasetsh while only 4 (2.17%) packages are devised for high volume data

Summary

20 Sep 2018 report report

Any reports and responses or comments on the article can be found at the end of the article. This article is included in the RPackage gateway. We are grateful to the reviewers for their time to read the paper and the comments they raised. We have addressed all comments raised by the reviewers and proof read it and made some additional changes. We hope the paper is easier to read . Any further responses from the reviewers can be found at the end of the article

Introduction

Methods

Davis G

16. Schwarz G

37. Tsagris M: MXM

Findings

Summary

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Sep 30, 2019
Citations: 16	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Feature selection with the R package MXM.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Feature selection with the R package MXM
Michail Tsagris ... Huitong Qiu
F1000Research | VOL. 7
Michail Tsagris, et. al.Michail Tsagris ... Huitong Qiu
12 Jul 2019
F1000Research | VOL. 7

Decision letter: Applying causal discovery to single-cell analyses using CausalCell
Babak Momeni ... Anna Akhmanova
-
Babak Momeni, et. al.Babak Momeni ... Anna Akhmanova
14 Aug 2022
14 Aug 2022

Author response: Applying causal discovery to single-cell analyses using CausalCell
Yujian Wen ... Hai Zhang
-
Yujian Wen, et. al.Yujian Wen ... Hai Zhang
23 Aug 2022
23 Aug 2022

Genetic algorithms in feature and instance selection
Chih-Fong Tsai ... Chi-Yuan Chu
Knowledge-Based Systems | VOL. 39
Chih-Fong Tsai, et. al.Chih-Fong Tsai ... Chi-Yuan Chu
27 Nov 2012
Knowledge-Based Systems | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature selection with the R package MXM.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research