Sparse Logistic Regression: Comparison of Regularization and Bayesian Implementations

Mattia Zanon,Gian Antonio Susto,Seán Mcloone,Giuliano Zambonin

doi:10.3390/a13060137

Abstract

In knowledge-based systems, besides obtaining good output prediction accuracy, it is crucial to understand the subset of input variables that have most influence on the output, with the goal of gaining deeper insight into the underlying process. These requirements call for logistic model estimation techniques that provide a sparse solution, i.e., where coefficients associated with non-important variables are set to zero. In this work we compare the performance of two methods: the first one is based on the well known Least Absolute Shrinkage and Selection Operator (LASSO) which involves regularization with an ℓ 1 norm; the second one is the Relevance Vector Machine (RVM) which is based on a Bayesian implementation of the linear logistic model. The two methods are extensively compared in this paper, on real and simulated datasets. Results show that, in general, the two approaches are comparable in terms of prediction performance. RVM outperforms the LASSO both in term of structure recovery (estimation of the correct non-zero model coefficients) and prediction accuracy when the dimensionality of the data tends to increase. However, LASSO shows comparable performance to RVM when the dimensionality of the data is much higher than number of samples that is p > > n .

Highlights

Techniques for the estimation of sparse models have gained increasing attention in the last two decades and have found several practical applications in different areas of science and engineering
Assuming that the output variable y is distributed in logistic regression according to a Bernoulli distribution, the probability of the outcome for the i data point can be written in compact form as: p(yi | xi ; θ) = σθ ( xi )yi (1 − σθ ( xi ))1−yi
In general, the performance for all indicators improves for both methods with increasing values of n, for example the misclassification error (MCE) in Figure 5 for dataset (f) decreases as n increases

Summary

Introduction

Techniques for the estimation of sparse models have gained increasing attention in the last two decades and have found several practical applications in different areas of science and engineering. In high dimensional settings, there is a high number of measured variables and not all of them are relevant in terms of correlation with the output. In these cases, sparse models are important to avoid overfitting and improve model prediction performance as well as to identify a subset of input variables representing the most important drivers of the output variation [8]. Filter methods first identify a subset of variables, for example using correlation analysis, which is used as input with standard algorithms.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Jun 8, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sparse Logistic Regression: Comparison of Regularization and Bayesian Implementations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

Prediction of siRNA Potency Using Sparse Logistic Regression
Wei Hu ... John Hu
Journal of computational biology : a journal of computational molecular cell biology | VOL. 21
Wei Hu, et. al.Wei Hu ... John Hu
20 Nov 2010
Journal of computational biology : a journal of computational molecular cell biology | VOL. 21

A gradient method for the monotone fused least absolute shrinkage and selection operator
Yu Xia ... Paul D Mcnicholas
Optimization Methods & Software | VOL. 29
Yu Xia, et. al.Yu Xia ... Paul D Mcnicholas
18 Jun 2013
Optimization Methods & Software | VOL. 29

Generalized interaction LASSO based on alternating direction method of multipliers for liver disease classification
Jing Li ... Jinjia Wang
Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi | VOL. 34
Jing Li, et. al.Jing Li ... Jinjia Wang
01 Jun 2017
Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi | VOL. 34

Predicting and identifying factors associated with undernutrition among children under five years in Ghana using machine learning algorithms.
Eric Komla Anku ... Benojir Ahammed
PloS one | VOL. 19
Eric Komla Anku, et. al.Eric Komla Anku ... Benojir Ahammed
13 Feb 2024
PloS one | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse Logistic Regression: Comparison of Regularization and Bayesian Implementations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms