Learned Query Optimizers
Learned Query Optimizers
- Research Article
71
- 10.1007/s00778-005-0172-6
- Apr 28, 2006
- The VLDB Journal
While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query processing and optimization for XML have now become more important than ever. This article reports our recent advances in XML structured-document query optimization. In this article, we elaborate on a novel approach and the techniques developed for XML query optimization. Our approach performs heuristic-based algebraic transformations on XPath queries, represented as PAT algebraic expressions, to achieve query optimization. This article first presents a comprehensive set of general equivalences with regard to XML documents and XML queries. Based on these equivalences, we developed a large set of deterministic algebraic transformation rules for XML query optimization. Our approach is unique, in that it performs exclusively deterministic transformations on queries for fast optimization. The deterministic nature of the proposed approach straightforwardly renders high optimization efficiency and simplicity in implementation. Our approach is a logical-level one, which is independent of any particular storage model. Therefore, the optimizers developed based on our approach can be easily adapted to a broad range of XML data/information servers to achieve fast query optimization. Experimental study confirms the validity and effectiveness of the proposed approach.
- Research Article
5
- 10.1145/3068612
- Sep 25, 2017
- Communications of the ACM
We propose a generalization of the classical database query optimization problem: multi-objective parametric query (MPQ) optimization. MPQ compares alternative processing plans according to multiple execution cost metrics. It also models missing pieces of information on which plan costs depend upon as parameters. Both features are crucial to model query processing on modern data processing platforms. MPQ generalizes previously proposed query optimization variants, such as multi-objective query optimization, parametric query optimization, and traditional query optimization. We show, however, that the MPQ problem has different properties than prior variants and solving it requires novel methods. We present an algorithm that solves the MPQ problem and finds, for a given query, the set of all relevant query plans. This set contains all plans that realize optimal execution cost tradeoffs for any combination of parameter values. Our algorithm is based on dynamic programming and recursively constructs relevant query plans by combining relevant plans for query parts. We assume that all plan execution cost functions are piecewise-linear in the parameters. We use linear programming to compare alternative plans and to identify plans that are not relevant. We present a complexity analysis of our algorithm and experimentally evaluate its performance.
- Research Article
- 10.1145/219713.219769
- Dec 1, 1995
- ACM SIGMOD Record
The National Technical University of Athens (NTUA) is the leading Technical University in Greece. The Computer Science Division of the Electrical and Computer Engineering Department covers several fields of practical, theoretical and technical computer science and is involved in several research projects supported by the EEC, the government and industrial companies. The Knowledge and Data Base Systems (KDBS) Laboratory was established in 1992 at the National Technical University of Athens. It is recognised internationally, evidenced by its participation as a central node in the Esprit Network of Excellence IDOMENEUS. The Information and Data on Open MEdia for NEtworks of USers, project aims to coordinate and improve European efforts in the development of next-generation information environments which will be capable of maintaining and communicating a largely extended class of information in an open set of media. The KDBS Laboratory employs one full-time research engineer and several graduate students. Its infrastructure includes a LAN with several DECstation 5000/200 and 5000/240 workstations, an HP Multimedia Workstation, several PCs and software for database and multimedia applications. The basic research interests of our Laboratory include: Spatial Database Systems, Multimedia Database Systems and Active Database Systems . Apart from the above database areas, interests of the KDBS Laboratory span several areas of Information Systems, such as Software Engineering Databases, Transactional Systems, Image Databases, Conceptual Modeling, Information System Development, Temporal Databases, Advanced Query Processing and Optimization Techniques. The group's efforts on Spatial Database Systems, include the study of new data structures, storage techniques, retrieval mechanisms and user interfaces for large geographic data bases. In particular, we look at specialized, spatial data structures (R-Trees and their variations) which allow for the direct access of the data based on their spatial properties, and not some sort of encoded representation of the objects' coordinates. We study implementation and optimization techniques of spatial data structures and develop models that make performance estimation. Finally, we are investigating techniques for the efficient representation of relationships and reasoning in space. The activities on Multimedia Database Systems, include the study of advanced data models, storage techniques, retrieval mechanisms and user interfaces for large multimedia data bases. The data models under study include the object-oriented model and the relational model with appropriate extensions to support multimedia data. We are also investigating content-based search techniques for image data bases. In a different direction, we are studying issues involved in the development of multimedia front-ends for conventional, relational data base systems. In the area of Active Database Systems, we are developing new mechanisms for implementing triggers in relational databases. Among the issues involved, we address the problem of efficiently finding qualifying rules against updates in large sets of triggers. This problem is especially critical in database system implementations of triggers, where large amounts of data may have to be searched in order to find out if a particular trigger may qualify to run or not. Continuing work that started at the Foundation for Research and Technology (FORTH), Institute of Computer Science, the group is investigating reuse-oriented approaches to information systems application development. The approaches are based on a repository that has been implemented at FORTH as a special purpose object store, with emphasis on multimodal and fast retrieval. Issues of relating and describing software artifacts (designs, code, etc.) are among the topics under investigation. A new important research direction of the group is on Data Warehouses, which are seen as collections of materialized views captured over a period of time from a heterogeneous distributed information system. Issues such as consistent updates, data warehouse evolution, view reconciliation and data quality are being investigated. Research in Image Databases deals with the retrieval by image content, that uses techniques from the area of Image Processing. We are currently at early stage in this direction, having collected many segmentation and edge detection algorithms, which will be used and evaluated in images of various contents. Our work on Advanced Query Processing and Optimization Techniques includes dynamic or parametric query optimization techniques. In most database systems, the values of many important runtime parameters of the system, the data, or the query are unknown at query optimization time. Dynamic, or parametric, query optimization attempts to identify several execution plans, each one of which is optimal for a subset of all possible values of the run time parameters. In the next sections we present in detail our research efforts on the three main research areas of the KDBS Laboratory: Spatial, Multimedia and Active Databases.
- Book Chapter
1
- 10.1007/0-387-25229-0_2
- Jan 1, 2005
Query execution and optimization for streaming data revisits almost all aspects of query execution and optimization over traditional, disk-bound database systems. The reason is that two fundamental assumptions of disk-bound systems are dropped: (i) the data resides on disk, and (ii) the data is finite. As such, new evaluation algorithms and new optimization metrics need to be devised. The approaches can be broadly classified into two categories. First, there are static approaches that follow the traditional optimize-then-execute paradigm by assuming that optimization-time assumptions will continue to hold during execution; the environment is expected to be relatively static in that respect. Alternatively, there are adaptive approaches that assume the environment is completely dynamic and highly unpredictable. In this chapter we explore both approaches and present novel query optimization and evaluation techniques for queries over streaming sources.
- Research Article
41
- 10.1145/1138394.1138397
- Jun 1, 2006
- ACM Transactions on Database Systems
Large-scale distributed environments, where each node is completely autonomous and offers services to its peers through external communication, pose significant challenges to query processing and optimization. Autonomy is the main source of the problem, as it results in lack of knowledge about any particular node with respect to the information it can produce and its characteristics, for example, cost of production or quality of produced results. In this article, inspired by e-commerce technology, we recognize queries as commodities and model query optimization as a trading negotiation process. Subquery answers and subquery operator execution jobs are traded between nodes until deals are struck with some nodes for all of them. Such trading may also occur recursively, in the sense that some nodes may play the role of intermediaries between other nodes (subcontracting). We identify the key parameters of the overall framework and suggest several potential alternatives for each one. In comparison to trading negotiations for e-commerce, query optimization faces unique new challenges that stem primarily from the fact that queries have a complex structure and can be broken into smaller parts. We address these challenges through a particular instantiation of our framework focusing primarily on the optimization algorithms run on “buying” and “selling” nodes, the evaluation metrics of the queries, and the negotiation strategy. Finally, we present the results of several experiments that demonstrate the performance characteristics of our approach compared to those of traditional query optimization.
- Research Article
- 10.5075/epfl-thesis-6995
- Jan 1, 2016
The goal of query optimization is to map a declarative query (describing data to generate) to a query plan (describing how to generate the data) with optimal execution cost. Query optimization is required to support declarative query interfaces. It is a core problem in the area of database systems and has received tremendous attention in the research community, starting with an initial publication in 1979. In this thesis, we revisit the query optimization problem. This visit is motivated by several developments that change the context of query optimization. That change is not reflected in prior literature. First, advances in query execution platforms and processing techniques have changed the context of query optimization. Novel provisioning models and processing techniques such as Cloud computing, crowdsourcing, or approximate processing allow to trade between different execution cost metrics (e.g., execution time versus monetary execution fees in case of Cloud computing). This makes it necessary to compare alternative execution plans according to multiple cost metrics in query optimization. While this is a common scenario nowadays, the literature on query optimization with multiple cost metrics (a generalization of the classical problem variant with one execution cost metric) is surprisingly sparse. While prior methods take hours to optimize even moderately sized queries when considering multiple cost metrics, we propose a multitude of approaches to make query optimization in such scenarios practical. A second development that we address in this thesis is the availability of novel software and hardware platforms that can be exploited for optimization. We will show that integer programming solvers, massively parallel clusters (which nowadays are commonly used for query execution), and adiabatic quantum annealers enable us to solve query optimization problem instances that are far beyond the capabilities of prior approaches. In summary, we propose seven novel approaches to query optimization that significantly increase the size of the problem instances that can be addressed (measured by the query size and by the number of considered execution cost metrics). Those novel approaches can be classified into three broad categories: moving query optimization before run time to relax constraints on optimization time, trading optimization time for relaxed optimality guarantees (leading to approximation schemes, incremental algorithms, and randomized algorithms for query optimization with multiple cost metrics), and reducing optimization time by leveraging novel software and hardware platforms (integer programming solvers, massively parallel clusters, and adiabatic quantum annealers). Those approaches are novel since they address novel problem variants of query optimization, introduced in this thesis, since they are novel for their respective problem variant (e.g., we propose the first randomized algorithm for query optimization with multiple cost metrics), or because they have never been used for optimization problems in the database domain (e.g., this is the first time that quantum computing is used to solve a database-specific optimization problem).
- Conference Article
- 10.5555/2694443.2694454
- Dec 14, 2012
Query processing and optimization in centralized and distributed environments is well-researched. Centralized query optimization focused on minimizing the number of input/output (or I/O) from disk. Distributed query processing focused mainly on maximizing local computation and minimizing data transfer between nodes. Here the distribution of data was pre-determined and both connectivity and bandwidth were pre-defined and guaranteed. Work on sensor data acquisition deal with non-join queries without taking mobility and connectivity interruptions into consideration. However, these assumptions are no longer true when queries are executed over repositories stored in mobile aerial vehicles which collect, process, and store data in real-time, and connectivity changes significantly over the duration of interest. Currently, only data in one vehicle can be queried by the ground control.This paper explores query processing and optimization issues along with concomitant metadata needed for processing/optimizing queries over distributed, mobile, connectivity-challenged environments. Since response-time and fault-tolerance are the main focus, we propose plans using join, semi-join, and replication-based approaches. We propose and evaluate several heuristics for this environment ranging from greedy to cumulative approaches along with the use of replicated copies of data. We have performed elaborate experimental analysis to validate heuristics that work well for this environment. As maintaining replication is a challenge in this environment, we summarize our initial approach. This work on connectivity-tolerant query optimization is part of a larger middleware-based, service-oriented architecture.
- Research Article
16
- 10.1023/a:1008757010516
- Mar 1, 1999
- Journal of Intelligent Information Systems
Object-oriented databases (OODBs) provide powerful data abstractions and modeling facilities but they usually lack a suitable framework for query processing and optimization. Even though there is an increasing number of recent proposals on OODB query optimization, only few of them are actually focused on query optimization in the presence of object identity and destructive updates, features often supported by most realistic OODB languages. This paper presents a formal framework for optimizing object-oriented queries in the presence of side effects. These queries may contain object updates at any place and in any form. We present a language extension to the monoid comprehension calculus to express these object-oriented features and we give a formal meaning to these extensions. Our method is based on denotational semantics, which is often used to give a formal meaning to imperative programming languages. The semantics of our language extensions is expressed in terms of our monoid calculus, without the need of any fundamental change to our basic framework. Our method not only maintains referential transparency, which allows us to do meaningful query optimization, but it is also practical for optimizing OODB queries since it allows the same optimization techniques applied to regular queries to be used with minimal changes for OODB queries with updates.
- Book Chapter
7
- 10.1007/3-540-50345-5_38
- Jan 1, 1988
The use of data abstraction in object-oriented databases places a burden on the ability of the system to perform query optimization. This paper discusses a framework for query specification and optimization that is applicable to object-oriented database systems that take a strict view of data abstraction. It examines techniques that preserve much of the optimization potential of relational languages by limiting the query language. It further examines techniques for query optimization that involve type-specific rewrite rules.
- Conference Article
9
- 10.1109/hicss.1989.48055
- Jan 3, 1989
The use of data abstraction in object-oriented databases places a burden on the ability of the system to perform query optimization. A framework for query specification and optimization is discussed that is applicable to object-oriented database systems that take a strict view of data abstraction. Techniques that preserve much of the optimization potential of relational languages by limiting the query language are examined. Techniques are given for query optimization that involve type-specific rewrite rules. >
- Conference Article
1
- 10.1109/iccsit.2010.5564420
- Jul 1, 2010
Spatial Data Warehouses (SDWs) combine spatial databases (SDBs) and data warehouses (DWs) allowing analysis of historical data. This data can be queried using Spatial On-Line Analytical Processing (SOLAP). SDW and SOLAP systems are emerging areas that raise several research issues. In this paper, we refer to a different problem existing in SDWs that motivated us to propose a framework - a conceptual multidimensional model able to express users' requirements for SDW and SOLAP applications. We present a different research direction that is important to consider providing satisfactory solutions for SDW and for SOLAP systems. That important area is spatial query optimization. For the past several years, the research on spatial database systems has actively progressed because the applications using the spatial information such as geographic information systems (GIS), computer aided design (CAD) and multimedia systems have increased. However, most of the research has dealt with only a part of spatial database systems such as data models, spatial indexes, spatial join algorithms, or cost models. There has been a little research on the spatial query optimization which can integrate them. Most of the spatial query optimization techniques published until now has not properly reflected the characteristics of the SDBs. This paper presents query optimization strategies which take the characteristics of SDBs into account. The application of standard query processing and optimization techniques in the context of an integrated SDB environment is discussed.
- Conference Article
5
- 10.1145/1341771.1341789
- Jan 18, 2008
The disparate and geographically distributed data sources in an enterprise can be integrated using distributed computing technologies such as data grids. The real challenge involved in such data integration efforts is in the design and development of the distributed query processing engine that lie beneath such integrated systems. In the current literature, distributed query processing and optimization is carried out in three distinct phases namely, (1) creation of single node plan, (2) generation of parallel plan, and (3) optimal site selection for plan execution. As considering the three phases in isolation leads to sub-optimal plans, the paper proposes a new distributed query optimization model that integrates all the three phases of the query optimization. This paper also presents different heuristic approaches for solving the proposed integrated distributed query processing problem. Furthermore, the presented system is integrated with a data grid solution and several real-time experiments are conducted to demonstrate its usefulness.
- Conference Article
2
- 10.1109/mixdes.2007.4286255
- Jun 1, 2007
This paper discusses methods for query execution and optimization in object-oriented grid databases. The queries for distributed databases have become even more complex. This cause a difficulty for query execution and optimization process, which requires advanced algorithms and techniques to reduce processing and communication cost. This paper gives the outline of distributed data processing, which is now the subject of our study and development.
- Conference Article
1
- 10.1109/icde.1991.131517
- Apr 8, 1991
The notion of a polymorphic database and the optimization of polymorphic queries-specifically, optimization of queries under the Morpheus data model-is addressed. The notion of query optimization through type inference, applicable both to polymorphic databases and traditional monomorphic databases, is introduced. The Morpheus data model and its type inference rules are reviewed and a polymorphic relational algebra is characterized. It is shown how the inference rules can be used for static optimization of a few sample queries. It is concluded that type inference provides a formal mechanism for optimizing a very rich extension to the relational algebra. The approach retains the basic framework that lead to the wide acceptance of the relational model, while enriching it with the structural expressiveness of the object-oriented approaches of recent years. >
- Research Article
2
- 10.1137/s0097539794262446
- Jan 1, 2000
- SIAM Journal on Computing
In the optimization of queries in an object-oriented database (OODB) system, a natural first step is to use the typing constraints imposed by the schema to transform a query into an equivalent one that logically accesses a minimal set of objects. We study a class of queries for OODBs called conjunctive queries. Variables in a conjunctive query range over heterogeneous sets of objects. Consequently, a conjunctive query is equivalent to a union of conjunctive queries of a special kind, called terminal conjunctive queries. Testing containment is a necessary step in solving the equivalence and minimization problems. We first characterize the containment and minimization conditions for the class of terminal conjunctive queries. We then characterize containment for the class of all conjunctive queries and derive an optimization algorithm for this class. The equivalent optimal query produced is expressed as a union of terminal conjunctive queries, which has the property that the number of variables as well as their search spaces are minimal among all unions of terminal conjunctive queries. Finally, we investigate the complexity of the containment problem. We show that it is complete in $\Pi^{p}_{2}$.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.