We consider the problem of partitioning a finite sequence of points in Euclidean space into a given number of clusters (subsequences) minimizing the sum over all clusters of intracluster sums of squared distances of elements of the clusters to their centers. It is assumed that the center of one of the desired clusters is the origin, while the centers of the other clusters are unknown and are defined as the mean values of cluster elements. Additionally, there are a few structural constraints on the elements of the sequence that enter the clusters with unknown centers: (1) the concatenation of indices of elements of these clusters is an increasing sequence, (2) the difference between two consequent indices is lower and upper bounded by prescribed constants, and (3) the total number of elements in these clusters is given as an input. It is shown that the problem is strongly NP-hard. A 2-approximation algorithm that is polynomial for a fixed number of clusters is proposed for this problem.
Read full abstract