Abstract

Data clustering plays a significant role in geospatial data management and analytics. In this light, we propose and study a novel geospatial data clustering method for multiple reference points. Given a set Q of geospatial data points, a candidate set O of reference points, and a threshold k, each data point q will be matched to its closest reference point o. The multi-reference clustering (MRC) method finds a subset A (A ⊆ O ∧ |A| ≤ k) reference points from O which define the minimum global travel distance (Σ P∀q∈Q,o∈A d(q, o)), hence the data are grouped into |A| clusters. We believe that the MRC method may benefit a lot of applications including geospatial data clustering, data classification, and data analytics in general. The MRC problem is challenging due to its high computation complexity, and there exist Σ i=1 k C |O| i = Σ i=1 k (|O|!/i!(|O|-i)! possibilities for subset A. Because the exact solution cannot be computed in real time, we develop a heuristic method to select subset A from O efficiently. The experimental results show that the accuracy of A is very close to the optimal solution. In addition, we also develop a set of optimization techniques to further enhance the efficiency. Finally, we conduct extensive experiments to study the efficiency and accuracy of the heuristic method.

Highlights

  • With the rapid development of GPS and navigation technology, and location based social networks, geospatial data (e.g., POIs, geo-tagged tweets and photos, etc.) are pervasive in our daily life

  • The multi-reference clustering (MRC) method finds a subset A (A ⊆ O ∧ |A| ≤ k) reference points from O which define the minimum global travel distance ( ∀q∈Q,o∈A d(q, o)), the geospatial data points are grouped into k clusters

  • This type of query is useful in many applications including geospatial data clustering, data classification, and data analytics in general

Read more

Summary

INTRODUCTION

With the rapid development of GPS and navigation technology, and location based social networks, geospatial data (e.g., POIs, geo-tagged tweets and photos, etc.) are pervasive in our daily life. The multi-reference clustering (MRC) method finds a subset A (A ⊆ O ∧ |A| ≤ k) reference points from O which define the minimum global travel distance ( ∀q∈Q,o∈A d(q, o)), the geospatial data points are grouped into k clusters. For each possible subset A, we match each data point to its closest reference point, and compute the global travel distance. The approximation algorithm of the k-medoids clustering cannot be used in the MSC problem because it cannot support the add and drop operation. It does not have pruning technique to accelerate the efficiency of query processing. Given a set Q of data points, and a set A of selected reference points, the global travel distance dg(Q, A) is defined

Result
BOUNDS
ALGORITHM
TIME COMPLEXITY
THREE OPERATIONS
EXPERIMENTS
EFFECT OF k
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.