Efficient Shared Execution Processing of k-Nearest Neighbor Joins in Road Networks

Hyung-Ju Cho

doi:10.1155/2018/1243289

Abstract

We investigate the k-nearest neighbor (kNN) join in road networks to determine the k-nearest neighbors (NNs) from a dataset S to every object in another dataset R. The kNN join is a primitive operation and is widely used in many data mining applications. However, it is an expensive operation because it combines the kNN query and the join operation, whereas most existing methods assume the use of the Euclidean distance metric. We alternatively consider the problem of processing kNN joins in road networks where the distance between two points is the length of the shortest path connecting them. We propose a shared execution-based approach called the group-nested loop (GNL) method that can efficiently evaluate kNN joins in road networks by exploiting grouping and shared execution. The GNL method can be easily implemented using existing kNN query algorithms. Extensive experiments using several real-life roadmaps confirm the superior performance and effectiveness of the proposed method in a wide range of problem settings.

Highlights

Road networks are often represented as weighted undirected graphs by placing a graph vertex at each road intersection or terminus and connecting vertices by edges that represent each segment of a road between two vertices [1,2,3,4,5]. e distance between two points in a road network is the length of the shortest path between them
We investigate the k-nearest neighbor join in road networks, which combines each object in a dataset with the k objects in another dataset that are closest to it [6,7,8]. e kNN join is a primitive operation, which is widely used in many data mining and analytic applications, such as kNN classification, k-means clustering, sample assessment and sample post processing, missing value imputation, and k-distance diagrams [6,7,8,9,10,11,12]
The basic group-nested loop (GNL) method only evaluates two kNN queries to retrieve the kNNs of all the outer objects in a segment, which decreases the processing time signi cantly. e baseline, basic GNL, and GNL methods evaluate a total of 50,000 kNN queries, 1,509 kNN queries, and 743 kNN queries, respectively. is means that the numbers of kNN queries evaluated by the basic GNL and GNL methods represent 3% and 1.5% of the kNN queries evaluated by the baseline method, respectively

Summary

Introduction

Road networks are often represented as weighted undirected graphs by placing a graph vertex at each road intersection or terminus and connecting vertices by edges that represent each segment of a road between two vertices [1,2,3,4,5]. e distance between two points in a road network is the length of the shortest path between them. We investigate the k-nearest neighbor (kNN) join in road networks, which combines each object in a dataset with the k objects in another dataset that are closest to it [6,7,8]. Erefore, many studies have been performed to improve the efficiency of the kNN join [6,7,8,9,10,11,12]. Most of these previous studies focused on processing the kNN join in the Euclidean space, mainly by designing elegant indexing techniques to avoid scanning the entire dataset repeatedly and to prune as many distance computations as possible.

Methods

Results

Conclusion