Abstract

Current relational database systems are deterministic in nature and lack the support for approximate matching. The result of approximate matching would be the tuples annotated with the percentage of similarity but the existing relational database system can not process these similarity scores further. In this paper, we propose a system to support approximate matching in the DBMS field. We introduce a `≈' (uncertain predicate operator) for approximate matching and devise a novel formula to calculate the similarity scores. Instead of returning an empty answer set in case of no match, our system gives ranked results thereby providing a glance at existing tuples closely matching with the queried literals. Two variants of the `≈' operator are also introduced for numeric data: `≈+' for higher-the-better and `≈-' for lower-the-better cases. Efficient approximate string matching methods are proposed for matching string-type data whereas numeric closeness is used for other types of data (date, time, and number). We also provide results of our system taken over several sample queries that illustrate the significance of our system. All experiments are performed using the MySQL database, whereas the IMDb movie database and European Football database are used as sample datasets.

Highlights

  • In traditional databases, select, from, and where are the fundamental clauses of any SQL query

  • We present the queries to calculate the above-mentioned distance by utilizing the in-built features provided by the DBMS (Section IV)

  • Probability Calculation Module combines the probabilities obtained from the uncertain predicates to calculate the final probability of filtered tuples (Fig. 4(f)) and we get a probabilistic database as an output (Fig. 4(g))

Read more

Summary

Introduction

Select, from, and where are the fundamental clauses of any SQL query. General SQL query takes relations specified in the from clause as an input, removes tuples which do not satisfy the predicates in the where clause, and selects the attributes specified in the select clause. The query returns a filtered relation as an output. A. PATTERN MATCHING IN DETERMINISTIC DATABASE Pattern matching in a deterministic database is performed using the like operator. Patterns are described using two special characters ‘%’ and ‘_’. The ‘%’ matches any string i.e., any number of characters and ‘_’ matches a single character. SQL uses the like operator to express the pattern. This query returns the title of books containing ‘Computer’ as a substring

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call