Abstract

In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances for numeric attributes. In this paper, we propose a new method for evaluating such type of ranking queries over a relational database. By statistics and training, this method builds a mechanism that combines the semantic and numeric distances, and the mechanism can be used to balance the effects of text attributes and numeric attributes on matching a given query and tuples in database search. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically for text attributes and on the information of numeric attributes. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. The results of extensive experiments indicate that the performance of this new strategy is efficient and effective.

Highlights

  • A relational ranking query is to find the N tuples that satisfy the query condition the best but not necessarily completely, and the results are sorted according to a given ranking function

  • In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances for numeric attributes

  • We propose a new method for evaluating such type of ranking queries over a relational database

Read more

Summary

Introduction

A relational ranking query (or top-N query) is to find the N tuples that satisfy the query condition the best but not necessarily completely, and the results are sorted according to a given ranking function. Researches on top-N queries have intensified since late 1990s [1,2,3], and most of the researches involve numeric attributes and use a numeric distance function (say, Lp-norm distances, p =1, 2, and ∞) to reduce a massive result set of a conventional query to a few of the most relevant answers. Database IPUMS has two relations Ipum and Occ, which come from [4]. In relation Occ50(num, occ50), its primary key num is the value label of occupation1950. Relation Ipum has 61 numeric attributes where A29 is age, A50 means income, and A40 is the foreign key referencing Occ50.num Ipum is added an attribute idx by us as identifier. Since there is no such word “horticulturist” in Occ50.occ, the answer will be nothing by using the JSEA

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call