The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Christian B�Hm,Florian Krebs

doi:10.1007/s10115-003-0122-9

Abstract

The similarity join has become an important database primitive for supporting similarity searches and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Two types of the similarity join are well-known, the distance range join, in which the user defines a distance threshold for the join, and the closest pair query or k-distance join, which retrieves the k most similar pairs. In this paper, we propose an important, third similarity join operation called the k-nearest neighbour join, which combines each point of one point set with its k nearest neighbours in the other set. We discover that many standard algorithms of Knowledge Discovery in Databases (KDD) such as k-means and k-medoid clustering, nearest neighbour classification, data cleansing, postprocessing of sampling-based data mining, etc. can be implemented on top of the k-nn join operation to achieve performance improvements without affecting the quality of the result of these algorithms. We propose a new algorithm to compute the k-nearest neighbour join using the multipage index (MuX), a specialised index structure for the similarity join. To reduce both CPU and I/O costs, we develop optimal loading and processing strategies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems

Lead the way for us

Journal: Knowledge and Information Systems	Publication Date: Feb 27, 2004
Citations: 157

Similar Papers

High performance data mining using the nearest neighbor join
C Bohm ... F Krebs
-
C Bohm, et. al.C Bohm ... F Krebs
09 Dec 2002
09 Dec 2002

Supporting KDD Applications by the k-Nearest Neighbor Join
Christian Böhm ... Florian Krebs
-
Christian Böhm, et. al.Christian Böhm ... Florian Krebs
01 Jan 2003
01 Jan 2003

Coarse to fine K nearest neighbor classifier
Yong Xu ... Hong Liu
Pattern Recognition Letters | VOL. 34
Yong Xu, et. al.Yong Xu ... Hong Liu
16 Feb 2013
Pattern Recognition Letters | VOL. 34

Adaptive Learning-Based -Nearest Neighbor Classifiers With Resilience to Class Imbalance.
Sankha Subhra Mullick ... Shounak Datta
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29
Sankha Subhra Mullick, et. al.Sankha Subhra Mullick ... Shounak Datta
27 Mar 2018
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems