Robust regression via error tolerance

Anton Björklund,Kai Puolamäki,Kimmo Kallonen,Emilia Oikarinen,Andreas Henelius

doi:10.1007/s10618-022-00819-2

Abstract

Real-world datasets are often characterised by outliers; data items that do not follow the same structure as the rest of the data. These outliers might negatively influence modelling of the data. In data analysis it is, therefore, important to consider methods that are robust to outliers. In this paper we develop a robust regression method that finds the largest subset of data items that can be approximated using a sparse linear model to a given precision. We show that this can yield the best possible robustness to outliers. However, this problem is NP-hard and to solve it we present an efficient approximation algorithm, termed SLISE. Our method extends existing state-of-the-art robust regression methods, especially in terms of speed on high-dimensional datasets. We demonstrate our method by applying it to both synthetic and real-world regression problems.

Highlights

In practically all analyses of real-world data we encounter outliers, i.e., data items that do not follow the same patterns as the majority of the data
Robust regression methods can be used as almost drop-in replacements for linear regression, which is still widely used because of the inherent interpretability and simplicity
In this paper we present a sparse robust regression method, termed slise (Sparse Linear Subset Explanations), that achieves the highest possible theoretical robustness and outperforms many existing state-of-the-art robust regression methods in terms of scalability on large datasets

Summary

Introduction

In practically all analyses of real-world data we encounter outliers, i.e., data items that do not follow the same patterns as the majority of the data Such items are problematic, since they may negatively influence modelling of the data. Since they may negatively influence modelling of the data This is observed, for instance, in ordinary least-squares (ols) regression where already a single outlier may lead to arbitrarily large errors (Donoho and Huber 1983). It is, important to consider robust methods that effectively avoid the influence of outliers. Robust regression can be used to search for outliers by investigating the data items that do not adhere to the robust model

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Mining and Knowledge Discovery	Publication Date: Jan 27, 2022
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Robust regression via error tolerance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery

Lead the way for us

Similar Papers

Comparative Study in Controlling Outliers and Multicollinearity Using Robust Performance Jackknife Ridge Regression Estimator Based on Generalized-M and Least Trimmed Square Estimator
Gustina Saputri ... Netti Herawati
Jambura Journal of Mathematics | VOL. 6
Gustina Saputri, et. al.Gustina Saputri ... Netti Herawati
01 Aug 2024
Jambura Journal of Mathematics | VOL. 6

Which robust regression technique is appropriate under violated assumptions? A simulation study
Jaejin Kim ... Johnson Ching-Hong Li
Methodology | VOL. 19
Jaejin Kim, et. al.Jaejin Kim ... Johnson Ching-Hong Li
22 Dec 2023
Methodology | VOL. 19

An asymmetric bisquare regression for mixed cyberattack-resilient load forecasting
Shangrui Zhao ... Xi-An Li
Expert Systems with Applications | VOL. 210
Shangrui Zhao, et. al.Shangrui Zhao ... Xi-An Li
12 Aug 2022
Expert Systems with Applications | VOL. 210

Robust statistical methods for high-dimensional data, with applications in tribology
Pia Pfeiffer ... Peter Filzmoser
Analytica Chimica Acta | VOL. 1279
Pia Pfeiffer, et. al.Pia Pfeiffer ... Peter Filzmoser
05 Sep 2023
Analytica Chimica Acta | VOL. 1279

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust regression via error tolerance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery