Abstract
Clustering addresses the problem of assigning similar objects to groups. Since the size of the clusters is often constrained in practical clustering applications, various capacitated clustering problems have received increasing attention. We consider here the capacitated p-median problem (CPMP) in which p objects are selected as cluster centers (medians) such that the total distance from these medians to their assigned objects is minimized. Each object is associated with a weight, and the total weight in each cluster must not exceed a given capacity. Numerous exact and heuristic solution approaches have been proposed for the CPMP. The state-of-the-art approach performs well for instances with up to 5,000 objects but becomes computationally expensive for instances with a much larger number of objects. We propose a matheuristic with new problem decomposition strategies that can deal with instances comprising up to 500,000 objects. In a computational experiment, the proposed matheuristic consistently outperformed the state-of-the-art approach on medium- and large-scale instances while having similar performance for small-scale instances. As an extension, we show that our matheuristic can be applied to related capacitated clustering problems, such as the capacitated centered clustering problem (CCCP). For several test instances of the CCCP, our matheuristic found new best-known solutions.
Highlights
Clustering is the task of assigning similar objects to groups, where the similarity between a pair of objects is determined by a distance measure based on features of the objects
We show that our matheuristic can be applied to related capacitated clustering problems, such as the capacitated centered clustering problem (CCCP)
We proposed a matheuristic that is designed for instances with a large number of objects
Summary
Clustering is the task of assigning similar objects to groups (clusters), where the similarity between a pair of objects is determined by a distance measure based on features of the objects. We propose a matheuristic with new problem decomposition strategies that are designed for large-scale instances These strategies (a) focus on subproblems with the potential for substantially improving the objective function value, (b) exploit the power of binary linear programming to ensure the feasibility with respect to the capacity constraints during the entire solution process, and (c) apply efficient data structures (k-d trees; Bentley 1975) to avoid computing a large number of pairwise distances. In the global optimization phase, we decompose the CPMP into a series of generalized assignment problems, which are formulated as binary linear programs and solved using a mathematical programming solver In each of these subproblems, objects are optimally assigned to fixed medians subject to the capacity constraints.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have