Robust separation of finite sets via quadratics

James E Falk,Vladimir E Karlov

doi:10.1016/s0305-0548(99)00134-3

Abstract

Given a pair of finite disjoint sets A and B in R n , a fundamental problem with many important applications is to efficiently determine a non-trivial, yet ‘simple’, function f (x) belonging to a prespecified family F which separates these sets when they are separable, or ‘nearly’ separates them when they are not. The most common class of functions F addressed to data are linear (because linear programming is often a convenient and efficient tool to employ both in determining separability and in generating a suitable separator). When the sets are not linearly separable, it is possible that the sets are separable over a wider class F of functions, e.g., quadratics. Even when the sets are linearly separable, another function may ‘better’ separate in the sense of more accurately predicting the status of points outside of A∪ B. We define a ‘robust’ separator f as one for which the minimum Euclidean distance between A∪ B and the set S={x∈ R n : f (x)=0} is maximal. In this paper we focus on robust quadratic separators and develop an algorithm using sequential linear programming to produce one when one exists. Numerical results are presented. Scope and purpose A fundamental problem with many important applications is to efficiently determine a nontrivial, yet ‘simple’, function f (x) which separates a pair of sets A and B in the sense that f is positive over A and negative over B. The function is then used to associate either A or B with points outside of the sets. As an example, if A consists of the results of tissue samples of cancerous patients, and B consists of the results of tissue samples from non-cancerous patients, a new sample c will be associated with either A or B according to the sign of the value f (c) . Most of the literature to date has focused on linear functions f as they are relatively easy to compute. In this paper we explore the use of quadratic functions. The advantage of using such functions is two fold — they can often separate when linear functions cannot, and they can separate more accurately than linear functions. We first define the notion of a ‘robust’ separating function which is as immune as possible (given the data) to small perturbations of the data. We then suggest an algorithm to (approximately) compute a robust quadratic separator, and show that it can be computed via a sequence of linear programs. The algorithm is tested on both randomly generated problems, as well as on the publicly available ‘Wisconsin Breast Cancer Database’. Its accuracy on this database is somewhat higher than that obtained by using linear robust separators.

Full Text