An Adaptive Similarity Search in Massive Datasets

Trong Nhan Phan,Josef Küng,Tran Khanh Dang

doi:10.1007/978-3-662-49175-1_3

Trong Nhan Phan, Josef Küng + Show 1 more

Open Access

https://doi.org/10.1007/978-3-662-49175-1_3

Copy DOI

Abstract

Similarity search is an important task engaging in different fields of studies as well as in various application domains. The era of big data, however, has been posing challenges on existing information systems in general and on similarity search in particular. Aiming at large-scale data processing, we propose an adaptive similarity search in massive datasets with MapReduce. Additionally, our proposed scheme is both applicable and adaptable to popular similarity search cases such as pairwise similarity, search-by-example, range queries, and k-Nearest Neighbour queries. Moreover, we embed our collaborative refinements to effectively minimize irrelevant data objects as well as unnecessary computations. Furthermore, we experience our proposed methods with the two different document models known as shingles and terms. Last but not least, we conduct intensive empirical experiments not only to verify these methods themselves but also to compare them with a previous related work on real datasets. The results, after all, confirm the effectiveness of our proposed methods and show that they outperform the previous work in terms of query processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Adaptive Similarity Search in Massive Datasets

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An Elastic Approximate Similarity Search in Very Large Datasets with MapReduce
Trong Nhan Phan ... Josef Küng
-
Trong Nhan Phan, et. al.Trong Nhan Phan ... Josef Küng
01 Jan 2014
01 Jan 2014

An Efficient Similarity Search in Large Data Collections with MapReduce
Trong Nhan Phan ... Tran Khanh Dang
-
Trong Nhan Phan, et. al.Trong Nhan Phan ... Tran Khanh Dang
01 Jan 2014
01 Jan 2014

A Lightweight Indexing Approach for Efficient Batch Similarity Processing with MapReduce
Trong Nhan Phan ... Tran Khanh Dang
SN Computer Science | VOL. 1
Trong Nhan Phan, et. al.Trong Nhan Phan ... Tran Khanh Dang
25 Jun 2019
SN Computer Science | VOL. 1

Indexing schemes for similarity search in datasets of short protein fragments
Aleksandar Stojmirović ... Vladimir Pestov
Information Systems | VOL. 32
Aleksandar Stojmirović, et. al.Aleksandar Stojmirović ... Vladimir Pestov
12 Mar 2007
Information Systems | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Adaptive Similarity Search in Massive Datasets

Abstract

Talk to us

Similar Papers