Abstract

Nowadays, urban multimodal big data are freely available to the public due to the growing number of cities, which plays a critical role in many fields such as transportation, education, medical treatment, and land resource management. The successful completion of poverty-relief work can greatly improve the quality of people’s life and ensure the sustainable development of the society. Poverty is a severe challenge for human society. It is of great significance to apply machine learning to mine different categories of poverty-stricken households and further provide decision support for poverty alleviation. Traditional poverty alleviation methods need to consume a lot of manpower, material resources, and financial resources. Based on the density-based spatial clustering of applications with noise (DBSCAN), this paper designs the hierarchical DBSCAN clustering algorithm to identify and analyze the categories of poverty-stricken households in China. First, the proposed method adjusts the neighborhood radius dynamically for dividing the data space into several initial clusters with different densities. Then, neighbor clusters are identified by the border and inner distances constantly and aggregated recursively to form new clusters. Based on the idea of division and aggregation, the proposed method can recognize clusters of different forms and deal with noises effectively in the data space with imbalanced density distribution. The experiments indicate that the method has the ideal performance of clustering, which identifies the commonness and difference in characteristics of poverty-stricken households reasonably. In terms of the specific indicator “Accuracy,” the accuracy increases by 2.3% compared with other methods.

Highlights

  • With the development of Information and Communication Technology, the era of multimodal big data has arrived comprehensively

  • Because the proposed method is the hierarchical density-based spatial clustering of applications with noise (DBSCAN) algorithm based on the initial division and aggregation of neighbor clusters, the time is higher than traditional DBSCAN

  • Conclusions is paper designs the hierarchical DBSCAN algorithm based on the initial division and aggregation of neighbor clusters

Read more

Summary

Introduction

With the development of Information and Communication Technology, the era of multimodal big data has arrived comprehensively. E design of clustering algorithms for poverty datasets should make reasonable consideration of noises caused by missing values and outliers. Parmar et al [27] proposed a residual errorbased density peak clustering algorithm named REDPC to better handle datasets comprising various data distribution patterns. Considering that clusters in real-world datasets may have different sizes, shapes, and densities, accompanied by certain noises and outliers, this paper takes the idea of initial division and hierarchical aggregation to design a clustering algorithm named hierarchical DBSCAN (HDBSCAN). (1) First, it makes an initial division of the dataset based on sample densities; that is, the proposed method takes the neighbor information of samples to calculate local density values and searches the set of density-connected samples for each unlabeled core point sequentially according to the density values in descending order, forming the initial clusters.

Theoretical Foundation
Experimental Design
Results and Analysis
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call