Abstract

Real-world datasets are often characterised by outliers; data items that do not follow the same structure as the rest of the data. These outliers might negatively influence modelling of the data. In data analysis it is, therefore, important to consider methods that are robust to outliers. In this paper we develop a robust regression method that finds the largest subset of data items that can be approximated using a sparse linear model to a given precision. We show that this can yield the best possible robustness to outliers. However, this problem is NP-hard and to solve it we present an efficient approximation algorithm, termed SLISE. Our method extends existing state-of-the-art robust regression methods, especially in terms of speed on high-dimensional datasets. We demonstrate our method by applying it to both synthetic and real-world regression problems.

Highlights

  • In practically all analyses of real-world data we encounter outliers, i.e., data items that do not follow the same patterns as the majority of the data

  • Robust regression methods can be used as almost drop-in replacements for linear regression, which is still widely used because of the inherent interpretability and simplicity

  • In this paper we present a sparse robust regression method, termed slise (Sparse Linear Subset Explanations), that achieves the highest possible theoretical robustness and outperforms many existing state-of-the-art robust regression methods in terms of scalability on large datasets

Read more

Summary

Introduction

In practically all analyses of real-world data we encounter outliers, i.e., data items that do not follow the same patterns as the majority of the data Such items are problematic, since they may negatively influence modelling of the data. Since they may negatively influence modelling of the data This is observed, for instance, in ordinary least-squares (ols) regression where already a single outlier may lead to arbitrarily large errors (Donoho and Huber 1983). It is, important to consider robust methods that effectively avoid the influence of outliers. Robust regression can be used to search for outliers by investigating the data items that do not adhere to the robust model

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.