Abstract

Spatial applications often require the ability to perform similarity search over a collection of point sets. For example, given a geographical distribution of a disease outbreak, find k historical outbreaks with similar spatial distributions from a data collection D. In this paper, we study the problem of similarity search over a collection of point sets using the Hausdorff distance, which is a measure commonly used to determine the maximum discrepancy between two point sets. To avoid computing the Hausdorff distance for all point sets S in D, one may compute an optimistic estimate (i.e., lower bound value) of the actual Hausdorff distance HausDist(Q,S) for each S to rule out sets that are obviously dissimilar to Q. In our investigation, we observed that a commonly used method (called BscLB) to compute an estimate may not produce a result which is indicative of the actual Hausdorff distance. Consequently, we propose a method (called EnhLB) which produces a tighter estimate than the existing one. We then formulate a similarity search algorithm which uses a combination of BscLB and EnhLB to find similar point sets efficiently. In addition, we also extend our method to support an outlier-resistant variant of the Hausdorff distance called the modified Hausdorff distance. We compare our proposed algorithm with an algorithm using only BscLB. The results of our experiments show a reduction in computation time of 72% for searches using the Hausdorff distance and a reduction of 53% using the modified Hausdorff distance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.