Abstract

Similarity search and similarity join are important operations in text databases. In some situations, some similar queries, called high-frequency queries, are repeated over a period of time. High-frequencyqueries-based filter is used to facilitate this type of queries. However, the performance of this method depends mostly on the chosen high-frequency queries. This paper proposes methods, which are based on DBSCAN and agglomerative hierarchical-based clustering algorithm, to find high-frequency queries for the filter, called DBRAN and DBSM. For evaluation, both DBRAN and DBSM are applied on various sets of queries to find high-frequency queries for three datasets. It is found that DBSM performs better than DBRAN when the variation among highfrequency queries is high. However, when the variation among high-frequency queries is low, the performance of both DBRAN and DBSM are about the same.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call