Provable training set debugging for linear regression

Xiaomin Zhang,Po-Ling Loh,Xiaojin Zhu

doi:10.1007/s10994-021-06040-4

Abstract

We investigate problems in penalized M-estimation, inspired by applications in machine learning debugging. Data are collected from two pools, one containing data with possibly contaminated labels, and the other which is known to contain only cleanly labeled points. We first formulate a general statistical algorithm for identifying buggy points and provide rigorous theoretical guarantees when the data follow a linear model. We then propose an algorithm for tuning parameter selection of our Lasso-based algorithm with theoretical guarantees. Finally, we consider a two-person “game” played between a bug generator and a debugger, where the debugger can augment the contaminated data set with cleanly labeled versions of points in the original data pool. We develop and analyze a debugging strategy in terms of a Mixed Integer Linear Programming (MILP). Finally, we provide empirical results to verify our theoretical results and the utility of the MILP strategy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Provable training set debugging for linear regression

Abstract

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Similar Papers

Reformulation-linearization Method for Global Optimization of Mixed Integer Linear Fractional Programming Problems with Application on Sustainable Batch Scheduling
Dajun Yue ... Fengqi You
Computer Aided Chemical Engineering | VOL. 33
Dajun Yue, et. al.Dajun Yue ... Fengqi You
01 Jan 2014
Computer Aided Chemical Engineering | VOL. 33

An adaptive turbo-shaft engine modeling method based on PS and MRR-LSSVR algorithms
Jiankang Wang ... Xianghua Huang
Chinese Journal of Aeronautics | VOL. 26
Jiankang Wang, et. al.Jiankang Wang ... Xianghua Huang
15 Jan 2013
Chinese Journal of Aeronautics | VOL. 26

K-Step Correction for Mixed Integer Linear Programming: A New Approach for Instrumental Variable Quantile Regressions and Related Problems
Yinchu Zhu
SSRN Electronic Journal | VOL. -
Yinchu ZhuYinchu Zhu
13 Oct 2018
SSRN Electronic Journal | VOL. -

An Improved Integer Programming Formulation for Inferring Chemical Compounds with Prescribed Topological Structures
Jianshen Zhu ... Liang Zhao
-
Jianshen Zhu, et. al.Jianshen Zhu ... Liang Zhao
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Provable training set debugging for linear regression

Abstract

Talk to us

Similar Papers

More From: Machine Learning