Abstract

In high-dimensional data, many sparse regression methods have been proposed. However, they may not be robust against outliers. Recently, the use of density power weight has been studied for robust parameter estimation, and the corresponding divergences have been discussed. One such divergence is the γ -divergence, and the robust estimator using the γ -divergence is known for having a strong robustness. In this paper, we extend the γ -divergence to the regression problem, consider the robust and sparse regression based on the γ -divergence and show that it has a strong robustness under heavy contamination even when outliers are heterogeneous. The loss function is constructed by an empirical estimate of the γ -divergence with sparse regularization, and the parameter estimate is defined as the minimizer of the loss function. To obtain the robust and sparse estimate, we propose an efficient update algorithm, which has a monotone decreasing property of the loss function. Particularly, we discuss a linear regression problem with L 1 regularization in detail. In numerical experiments and real data analyses, we see that the proposed method outperforms past robust and sparse methods.

Highlights

  • In high-dimensional data, sparse regression methods have been intensively studied

  • The γ-divergence proposed by Fujisawa and Eguchi [9] is known for having a strong robustness, which implies that the latent bias can be sufficiently small even under heavy contamination

  • We propose the robust and sparse regression problem based on the γ-divergence

Read more

Summary

Introduction

In high-dimensional data, sparse regression methods have been intensively studied. The Lasso [1]. Is a typical sparse linear regression method with L1 regularization, but is not robust against outliers. Robust and sparse linear regression methods have been proposed. The sparse least trimmed squares (sLTS) [4] is a sparse version of the well-known robust linear regression method LTS [5] based on the trimmed loss function with L1 regularization. We consider a loss function based on the γ-divergence with sparse regularization and propose an update algorithm to obtain the robust and sparse estimate. Fujisawa and Eguchi [9] used a Pythagorean relation on the γ-divergence, but it is not compatible with sparse regularization Instead of this relation, we use the majorization-minimization algorithm [14]. The R language software package “gamreg”, which we use to implement our proposed method, can be downloaded at http://cran.r-project.org/web/packages/gamreg/

Regression Based on γ-Divergence
Estimation for γ-Regression
MM Algorithm for Sparse γ-Regression
Sparse γ-Linear Regression
Robust Cross-Validation
Robust Properties
Homogeneous Contamination
Heterogeneous Contamination
Redescending Property
Numerical Experiment
Regression Models for Simulation
Performance Measure
Comparative Methods
Initial Points
How to Choose Tuning Parameters
Result
Computational Cost
NCI-60 Cancer Cell Panel
Protein Homology Dataset
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.