Abstract

Clustering addresses the problem of assigning similar objects to groups. Since the size of the clusters is often constrained in practical clustering applications, various capacitated clustering problems have received increasing attention. We consider here the capacitated p-median problem (CPMP) in which p objects are selected as cluster centers (medians) such that the total distance from these medians to their assigned objects is minimized. Each object is associated with a weight, and the total weight in each cluster must not exceed a given capacity. Numerous exact and heuristic solution approaches have been proposed for the CPMP. The state-of-the-art approach performs well for instances with up to 5,000 objects but becomes computationally expensive for instances with a much larger number of objects. We propose a matheuristic with new problem decomposition strategies that can deal with instances comprising up to 500,000 objects. In a computational experiment, the proposed matheuristic consistently outperformed the state-of-the-art approach on medium- and large-scale instances while having similar performance for small-scale instances. As an extension, we show that our matheuristic can be applied to related capacitated clustering problems, such as the capacitated centered clustering problem (CCCP). For several test instances of the CCCP, our matheuristic found new best-known solutions.

Highlights

  • Clustering is the task of assigning similar objects to groups, where the similarity between a pair of objects is determined by a distance measure based on features of the objects

  • We show that our matheuristic can be applied to related capacitated clustering problems, such as the capacitated centered clustering problem (CCCP)

  • We proposed a matheuristic that is designed for instances with a large number of objects

Read more

Summary

Introduction

Clustering is the task of assigning similar objects to groups (clusters), where the similarity between a pair of objects is determined by a distance measure based on features of the objects. We propose a matheuristic with new problem decomposition strategies that are designed for large-scale instances These strategies (a) focus on subproblems with the potential for substantially improving the objective function value, (b) exploit the power of binary linear programming to ensure the feasibility with respect to the capacity constraints during the entire solution process, and (c) apply efficient data structures (k-d trees; Bentley 1975) to avoid computing a large number of pairwise distances. In the global optimization phase, we decompose the CPMP into a series of generalized assignment problems, which are formulated as binary linear programs and solved using a mathematical programming solver In each of these subproblems, objects are optimally assigned to fixed medians subject to the capacity constraints.

Description of the problem
Illustrative example
Literature review
Exact approaches
Metaheuristics
Matheuristics
Proposed matheuristic
Global optimization phase
Local optimization phase
Search for nearest neighbors using k-d trees
Computational experiment
Selection of control parameters
Experimental design
Numerical results
Capacitated centered clustering problem
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call