Probabilistic analysis of vantage point trees

Vladyslav Bohun

doi:10.15559/21-vmsta188

Abstract

Probabilistic analysis of vantage point trees

Highlights

A vantage point tree is a data structure for fast executing of nearest neighbor search queries in a given metric space
This class of trees was first introduced in 1993 in [17], and is widely used since. It is not the only class of trees used for nearest neighbor search, other famous examples being kd-trees [3, 12], ball trees [14] as well as many other data structures
A Markov chain (Xh)h≥0 on a state space S is a Harris chain if there are two sets A, B ⊂ S, a positive function q(x, y) ≥ ε > 0 for x ∈ A, y ∈ B, and a probability measure ρ concentrated on B such that the following two conditions hold: 1. P{inf{h ≥ 0 : Xh ∈ A} < ∞} > 0 for all possible starting states X0 ∈ S; 2. if x ∈ A and C ⊂ B, P{Xh+1 ∈ C|Xh = x} ≥ C q(x, y)ρ(dy)

Summary

Introduction

A vantage point tree (vp-tree) is a data structure for fast executing of nearest neighbor search queries in a given metric space. This class of trees was first introduced in 1993 in [17], and is widely used since . Last but not least example we mention here is a similarity search, such as search for similar images or similar articles Another major field, where the nearest neighbor search is an important tool, is machine learning. We refer to [6], which is a good exposition of ideas behind solving classification problem with the nearest neighbors search These models proved useful in areas of classification such as text categorization, multimedia categorization and search, pattern recognition, information retrieval and many others.

Random vantage point tree model

Convergence of the length of the leftmost path

Limit theorems for the length of the leftmost path

One-dimensional convergence