Estimating the selectivity of LIKE queries using pattern-based histograms

Mehmet Ayti̇mur,Ali Çakmak

doi:10.3906/elk-1806-96

Abstract

Accurate cost and time estimation of a query is one of the major success indicators for database management systems. SQL allows the expression of flexible queries on text-formatted data. The LIKE operator is used to search for a specified pattern (e.g., LIKE "luck %") in a string database. It is vital to estimate the selectivity of such flexible predicates for the query optimizer to choose an efficient execution plan. In this paper, we study the problem of estimating the selectivity of a LIKE query predicate over a bag of strings. We propose a new type of pattern-based histogram structure to summarize the data distribution in a particular column. More specifically, we first mine sequential patterns over a given string database and then construct a special histogram out of the mined patterns. During query optimization time, pattern-based histograms are exploited to estimate the selectivity of a LIKE predicate. The experimental results on a real dataset from DBLP show that the proposed technique outperforms the state of the art for generic LIKE queries likeke $\%s_1\%s_2\%...\%s_n\%$ where $s_i$ represents one or more characters. What is more, the proposed histogram structure requires more than two orders of magnitude smaller memory space, and the estimation time is almost an order of magnitude less in comparison to the state of the art.

Highlights

One of the key reasons for the success of relational database management systems is their advanced query optimization capabilities
We propose a new technique, SPH, to estimate the selectivity of LIKE query predicates based on a novel summary structure called pattern-based histograms
We build a special histogram structure on top of the sequence patterns extracted from a string database

Summary

Introduction

One of the key reasons for the success of relational database management systems is their advanced query optimization capabilities. The query optimizer explores all or a subset of possible execution plan alternatives and determines the most efficient way to execute a given query. With the explosion of the Internet and textbased data, the role of the query optimizer is even more critical to efficiently query the huge amounts of textual data. Rather than exact equality string predicates, often, flexible patterns are preferred to search in such textual data piles. SQL provides the LIKE operator to enable approximate string searches. Consider a table in a database that stores customer records such as name, age, salary, etc. In SQL, such a query is expressed as follows: SELECT * FROM CUSTOMERS WHERE name LIKE ‘Lucia%’

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Turkish Journal of Electrical Engineering and Computer Sciences	Publication Date: Nov 29, 2018
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Estimating the selectivity of LIKE queries using pattern-based histograms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Turkish Journal of Electrical Engineering and Computer Sciences

Lead the way for us

Similar Papers

The processes of public megaproject cost estimation: The inaccuracy of reference class forecasting
Tim Neerup Themsen
Financial Accountability and Management | VOL. 35
Tim Neerup ThemsenTim Neerup Themsen
01 Jul 2019
Financial Accountability and Management | VOL. 35

A novel soft computing model to increase the accuracy of software development cost estimation
Iman Attarzadeh ... Siew Hock Ow
-
Iman Attarzadeh, et. al.Iman Attarzadeh ... Siew Hock Ow
01 Feb 2010
01 Feb 2010

Identification of fuzzy models of software cost estimation
Zhiwei Xu ... Taghi M Khoshgoftaar
Fuzzy Sets and Systems | VOL. 145
Zhiwei Xu, et. al.Zhiwei Xu ... Taghi M Khoshgoftaar
06 Nov 2003
Fuzzy Sets and Systems | VOL. 145

Web-based cost estimation of machining rotational parts
David Ben-Arieh ... Qian Li
Production Planning and Control | VOL. 14
David Ben-Arieh, et. al.David Ben-Arieh ... Qian Li
01 Dec 2003
Production Planning and Control | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Estimating the selectivity of LIKE queries using pattern-based histograms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Turkish Journal of Electrical Engineering and Computer Sciences