With the rapid development of sensors and communication technologies, it has become easy to collect large-scale and long-term crowd movement positioning data, which brings new opportunities for studying crowd movement patterns. This paper proposes a novel Statistical and Density-Based Clustering algorithm (SDBC) to identify implicit significant spatial aggregation patterns in geographical flow data. Unlike existing flow clustering algorithms, this method identifies the hot spots of origin-destination (OD) flows based on local spatial statistics and density-growing clustering. It also evaluates the significance of the identified geographic flow clusters, effectively reducing the identification of spurious clusters generated by chance in data. In our method, the spatial neighborhood of each flow is first obtained based on spatial proximity, temporal similarity, and directional similarity. Then, the number of flows in the spatial neighborhood of each flow is calculated and used as the density measure. Based on this, high-density flows are automatically detected using local spatial aggregation statistics, and a hierarchical density-based clustering strategy is developed to merge adjacent high-density flows to generate candidate flow clusters. Finally, we perform permutation tests to infer the statistical significance of each flow cluster and eliminate the candidate clusters generated by chance. Experiments on synthetic data and real-world taxi trajectory data were conducted to evaluate the effectiveness of the proposed method. Results show that the proposed method can accurately identify the statistically significant flow clusters of different shapes and densities and performs better than the available state-of-the-art flow clustering algorithms.
Read full abstract