Abstract

This paper proposes a novel supervised clustering algorithm to analyze large datasets. The proposed clustering algorithm models the problem as a matching problem between two disjoint sets of agents, namely, centroids and data points. This novel view of the clustering problem allows the proposed algorithm to be multi-objective, where each agent may have its own objective function. The proposed algorithm is used to maximize the purity and similarity in each cluster simultaneously. Our algorithm shows promising performance when tested using two different transportation datasets. The first dataset includes speed measurements along a section of Interstate 64 in the state of Virginia, while the second dataset includes the bike station status of a bike sharing system (BSS) in the San Francisco Bay Area. We clustered each dataset separately to examine how traffic and bike patterns change within clusters and then determined when and where the system would be congested or imbalanced, respectively. Using a spatial analysis of these congestion states or imbalance points, we propose potential solutions for decision makers and agencies to improve the operations of I-64 and the BSS. We demonstrate that the proposed algorithm produces better results than classical $k$ -means clustering algorithms when applied to our datasets with respect to a time event. The contributions of our paper are: 1) we developed a multi-objective clustering algorithm; 2) the algorithm is scalable (polynomial order), fast, and simple; and 3) the algorithm simultaneously identifies a stable number of clusters and clusters the data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call