Coresets for the Average Case Error for Finite Query Sets.

Alaa Maalouf,Murad Tukan,Dan Feldman,Ibrahim Jubran

doi:10.3390/s21196689

Abstract

Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the n input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for , where is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided.

Highlights

In this paper, we assume that the input is a set P of items, called points
In what follows we assume that the points of our input set P lie inside the unit ball (∀ p∈ P : k pk ≤ 1). For such an input set, we present a construction of a variant of a vector summarization coreset, where the error is ε and does not depend on the variance of the input
We show how to compute a vector summarization coreset with high probability in a time that is sublinear in the input size | Q| = n

Summary

Introduction

We assume that the input is a set P of items, called points. P is a finite set of n points in Rd or other metric space. In the context of PAC (probably approximately correct) learning [1], or empirical risk minimization [2] it represents the training set. In supervised learning every point in P may include its label or class. We assume a given function w : P → (0, ∞) called weights function that assigns a “weight” w( p) > 0 for every point p ∈ P. The weights function represents a distribution of importance over the input points, where the natural choice is uniform distribution, i.e., w( p) = 1/| P| for every p ∈ P. We are given a (possibly infinite) set X that is the set of queries [3] which represents candidate models or hypothesis, e.g., neural networks [4], SVMs [5] or a set of vectors in Rd with tuning parameters as in linear/ridge/lasso regression [6,7,8]

Objectives

Methods

Results

Discussion

Conclusion