Abstract

Clustering is one of the most important problems in the fields of data mining, machine learning, and biological population division, etc. Moreover, robust variant for [Formula: see text]-means problem, which includes [Formula: see text]-means with penalties and [Formula: see text]-means with outliers, is also an active research branch. Most of these problems are NP-hard even the most classical problem, [Formula: see text]-means problem. For the NP-hard problems, the heuristic algorithm is a powerful method. When the quality of the output can be guaranteed, the algorithm is called an approximation algorithm. In this paper, combining two types of robust settings, we consider [Formula: see text]-means problem with penalties and outliers ([Formula: see text]-MPO). In the [Formula: see text]-MPO, we are given an [Formula: see text]-point set [Formula: see text], a penalty cost [Formula: see text] for each [Formula: see text], an integer [Formula: see text], and an integer [Formula: see text]. The target is to find a center subset [Formula: see text] with [Formula: see text], a penalty subset [Formula: see text] and an outlier subset [Formula: see text] with [Formula: see text], such that the sum of the total costs, including the connection cost and the penalty cost, is minimized. We offer an approximation algorithm using a heuristic local search scheme. Based on a single-swap manipulation, we obtain [Formula: see text]-approximation algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call