VECTOR SPACE INDEXING FOR BIOSEQUENCE SIMILARITY SEARCHES

Ozgur Ozturk,Hakan Ferhatosmanoglu

doi:10.1142/s0218213005002405

Abstract

We present a multi-dimensional indexing approach for fast sequence similarity search in DNA and protein databases. In particular, we propose effective transformations of subsequences into numerical vector domains and build efficient index structures on the transformed vectors. We then define distance functions in the transformed domain and examine properties of these functions. We experimentally compared their (a) approximation quality for k-Nearest Neighbor (k-NN) queries and both (b) pruning ability and (c) approximation quality for ε-range queries. Results for k-NN queries, which we present here, show that our proposed distances FD2 and WD2 (i.e. Frequency and Wavelet Distance functions for 2-grams) perform significantly better than the others. We then develop effective index structures, based on R-trees and scalar quantization, on top of transformed vectors and distance functions. Promising results from the experiments on real biosequence data sets are presented.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

VECTOR SPACE INDEXING FOR BIOSEQUENCE SIMILARITY SEARCHES

Abstract

Talk to us

Similar Papers

More From: International Journal on Artificial Intelligence Tools

Lead the way for us

Journal: International Journal on Artificial Intelligence Tools	Publication Date: Oct 1, 2005
Citations: 16

Similar Papers

Effective indexing and filtering for similarity search in large biosequence databases
O Ozturk ... H Ferhatosmanoglu
-
O Ozturk, et. al.O Ozturk ... H Ferhatosmanoglu
10 Mar 2003
10 Mar 2003

Effective Similarity Search in Multimedia Databases using Multiple Representations
H.-P Kriegel ... P Kunath
-
H.-P Kriegel, et. al.H.-P Kriegel ... P Kunath
24 Jul 2006
24 Jul 2006

A multistep approach for shape similarity search in image databases
M Ankerst ... H.-P Kriegel
IEEE Transactions on Knowledge and Data Engineering | VOL. 10
M Ankerst, et. al.M Ankerst ... H.-P Kriegel
01 Jan 1998
IEEE Transactions on Knowledge and Data Engineering | VOL. 10

G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.
Xiaohong Wang ... Gerald H Lushington
Advances in database technology : proceedings. International Conference on Extending Database Technology | VOL. 360
Xiaohong Wang, et. al.Xiaohong Wang ... Gerald H Lushington
24 Mar 2009
Advances in database technology : proceedings. International Conference on Extending Database Technology | VOL. 360

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VECTOR SPACE INDEXING FOR BIOSEQUENCE SIMILARITY SEARCHES

Abstract

Talk to us

Similar Papers

More From: International Journal on Artificial Intelligence Tools