From Massive Parallelization to Quantum Computing: Seven Novel Approaches to Query Optimization
The goal of query optimization is to map a declarative query (describing data to generate) to a query plan (describing how to generate the data) with optimal execution cost. Query optimization is required to support declarative query interfaces. It is a core problem in the area of database systems and has received tremendous attention in the research community, starting with an initial publication in 1979. In this thesis, we revisit the query optimization problem. This visit is motivated by several developments that change the context of query optimization. That change is not reflected in prior literature. First, advances in query execution platforms and processing techniques have changed the context of query optimization. Novel provisioning models and processing techniques such as Cloud computing, crowdsourcing, or approximate processing allow to trade between different execution cost metrics (e.g., execution time versus monetary execution fees in case of Cloud computing). This makes it necessary to compare alternative execution plans according to multiple cost metrics in query optimization. While this is a common scenario nowadays, the literature on query optimization with multiple cost metrics (a generalization of the classical problem variant with one execution cost metric) is surprisingly sparse. While prior methods take hours to optimize even moderately sized queries when considering multiple cost metrics, we propose a multitude of approaches to make query optimization in such scenarios practical. A second development that we address in this thesis is the availability of novel software and hardware platforms that can be exploited for optimization. We will show that integer programming solvers, massively parallel clusters (which nowadays are commonly used for query execution), and adiabatic quantum annealers enable us to solve query optimization problem instances that are far beyond the capabilities of prior approaches. In summary, we propose seven novel approaches to query optimization that significantly increase the size of the problem instances that can be addressed (measured by the query size and by the number of considered execution cost metrics). Those novel approaches can be classified into three broad categories: moving query optimization before run time to relax constraints on optimization time, trading optimization time for relaxed optimality guarantees (leading to approximation schemes, incremental algorithms, and randomized algorithms for query optimization with multiple cost metrics), and reducing optimization time by leveraging novel software and hardware platforms (integer programming solvers, massively parallel clusters, and adiabatic quantum annealers). Those approaches are novel since they address novel problem variants of query optimization, introduced in this thesis, since they are novel for their respective problem variant (e.g., we propose the first randomized algorithm for query optimization with multiple cost metrics), or because they have never been used for optimization problems in the database domain (e.g., this is the first time that quantum computing is used to solve a database-specific optimization problem).
- Research Article
26
- 10.14778/2735508.2735512
- Nov 1, 2014
- Proceedings of the VLDB Endowment
Classical query optimization compares query plans according to one cost metric and associates each plan with a constant cost value. In this paper, we introduce the Multi-Objective Parametric Query Optimization (MPQ) problem where query plans are compared according to multiple cost metrics and the cost of a given plan according to a given metric is modeled as a function that depends on multiple parameters. The cost metrics may for instance include execution time or monetary fees; a parameter may represent the selectivity of a query predicate that is unspecified at optimization time. MPQ generalizes parametric query optimization (which allows multiple parameters but only one cost metric) and multi-objective query optimization (which allows multiple cost metrics but no parameters). We formally analyze the novel MPQ problem and show why existing algorithms are inapplicable. We present a generic algorithm for MPQ and a specialized version for MPQ with piecewise-linear plan cost functions. We prove that both algorithms find all relevant query plans and experimentally evaluate the performance of our second algorithm in a Cloud computing scenario.
- Book Chapter
2
- 10.1007/978-981-10-0448-3_18
- Jan 1, 2016
A classical query optimization compares solutions on single cost metric, not capable for multiple costs. A multi-objective parametric optimization (MPQ) approach is potentially capable for optimization over multiple cost metrics and query parameters. This paper demonstrated an approach for multi-objective parametric query optimization (MPQO) for advanced database systems such as distributed database systems (DDBS). The query equivalent plans are compared according to multiple cost metrics and query related parameters (modeled by a function on metrics), cost metrics, and query parameters are semantically different and computed at different stage of optimization. MPQO also generalizes parametric optimization by catering the multiple metrics for query optimization. In this paper, performance of MPQO variants based on nature-inspired optimization; ‘Multi-Objective Genetic Algorithm’ and a parameter-less optimization ‘Teaching-learning- based optimization’ are also analyzed. MPQO builds a parametric space of query plans and progressively explores the multi-objective space according to user tradeoffs on query metrics. In heterogeneous and distributed database system, logically unified data is replicated and distributed across multiple distributed sites to achieve high reliable and available data system; this imposed a challenge on evaluation of Pareto set. An MPQO attempt exhaustively determines the optimal query plans on each end of parametric space.
- Research Article
5
- 10.1145/3068612
- Sep 25, 2017
- Communications of the ACM
We propose a generalization of the classical database query optimization problem: multi-objective parametric query (MPQ) optimization. MPQ compares alternative processing plans according to multiple execution cost metrics. It also models missing pieces of information on which plan costs depend upon as parameters. Both features are crucial to model query processing on modern data processing platforms. MPQ generalizes previously proposed query optimization variants, such as multi-objective query optimization, parametric query optimization, and traditional query optimization. We show, however, that the MPQ problem has different properties than prior variants and solving it requires novel methods. We present an algorithm that solves the MPQ problem and finds, for a given query, the set of all relevant query plans. This set contains all plans that realize optimal execution cost tradeoffs for any combination of parameter values. Our algorithm is based on dynamic programming and recursively constructs relevant query plans by combining relevant plans for query parts. We assume that all plan execution cost functions are piecewise-linear in the parameters. We use linear programming to compare alternative plans and to identify plans that are not relevant. We present a complexity analysis of our algorithm and experimentally evaluate its performance.
- Research Article
13
- 10.1145/2949741.2949748
- Jun 2, 2016
- ACM SIGMOD Record
We propose a generalization of the classical database query optimization problem: multi-objective parametric query optimization (MPQ). MPQ compares alternative processing plans according to multiple execution cost metrics. It also models missing pieces of information on which plan costs depend upon as parameters. Both features are crucial to model query processing on modern data processing platforms. MPQ generalizes previously proposed query optimization variants such as multi-objective query optimization, parametric query optimization, and traditional query optimization. We show however that the MPQ problem has different properties than prior variants and solving it requires novel methods. We present an algorithm that solves the MPQ problem and finds for a given query the set of all relevant query plans. This set contains all plans that realize optimal execution cost tradeoffs for any combination of parameter values. Our algorithm is based on dynamic programming and recursively constructs relevant query plans by combining relevant plans for query parts. We assume that all plan execution cost functions are piecewise-linear in the parameters. We use linear programming to compare alternative plans and to identify plans that are not relevant. We present a complexity analysis of our algorithm and experimentally evaluate its performance.
- Research Article
17
- 10.1007/s00265-017-2309-1
- Apr 17, 2017
- Behavioral Ecology and Sociobiology
Theory predicts a trade-off between current reproduction and future reproduction or survival. Nevertheless, costs of reproduction are often not found owing to heterogeneity in environment or individuals, or to studies not evaluating multiple costs or reproductive metrics that could influence costs. Detecting costs of reproduction is further complicated by the fact that they may increase with age if physiological condition declines in older individuals, or decrease with age if younger individuals are less efficient at acquiring resources than older individuals. We used a 37-year study to evaluate costs of reproduction in song sparrows (Melospiza melodia). Our results support theoretical expectations for short-lived species by demonstrating costs of reproduction on future survival, but not on future reproduction. We examined two metrics of reproductive allocation—reproductive effort and termination of breeding—and found that only higher reproductive effort increased costs. Thus, testing of multiple allocation metrics may be necessary because results may not be coincident between metrics. Lastly, we observed that younger females paid higher costs of reproduction than older females. Although older female sparrows senesced, they had lower costs of reproduction than younger females who may be less able to acquire food or high-quality mates. By taking into account variation among individuals and examining multiple metrics, our study provides strong support for costs of reproduction and a decrease in costs with age. One premise of life history theory is that reproduction is costly, but evidence of such costs remains mixed. Mixed results may arise because resources are not always limiting for all individuals or in all environments, leading to temporal, spatial, and individual heterogeneity in trade-offs between current reproduction and future reproduction or survival that can be hard to detect. Costs may also vary with age, depending on how resource acquisition and allocation vary with age. We used a 37-year study of female song sparrows (Melospiza melodia) to control statistically for individual and environmental heterogeneity and test multiple cost metrics. We demonstrate marked costs of reproduction on future survival, particularly in young females.
- Research Article
- 10.21268/20200123-1
- Jan 31, 2021
One of the main challenges in software systems development is reusability. The interaction between traditional software systems is taking place through their interfaces. Thus, as more software systems are developed, the complexity of the interconnection between the interfaces grows dramatically. This complexity resulted in a decrease in development time and code quality. The role of a subject-matter expert (SME) or Domain Expert, a person with knowledge in a specific area, emerged to tackle this problem and increase the software reusability by modeling the domain rather than the technology. It resulted in a collaboration between the domain experts and developers and sped up the software development life cycle. Similarly, the role of DevOps emerged to bridge the gap between developers and operators in order to automate the process between software development and production release. The main challenge today is to achieve reusability and interconnections between domains of the same or different interests. The ontology concept was introduced as an engineering artifact to describe a reality semantically. Ontologies aim to represent information in a way to be understood and processed by a computer. Therefore, an Ontology Engineer is a person with knowledge for ontology's vocabulary, rules of inference, logic, and its construction. Common domain description languages often lack the semantic representation of their entities. By contrast, the semantic meaning of a domain of interest can be expressed using the ontology's vocabularies and axioms. The inadequate conceptual knowledge representation of a domain expert and the inadequate domain knowledge of an ontology engineer yields a gap between both worlds. Thus, the usage of ontologies in the domain modeling is yet considered to be a challenge. Domain Engineering is an approach that aim for the creation and development of domains on semantic bases. This work focus on investigating the challenges and limitations in the syntactic- and semantic-based development fields. The main objective is to approach the domain engineering as a solution for bridging the gap between domain experts and ontology engineers. Also, the study introduces a domain knowledge development life cycle approach to help by the creation and development of domain representations on semantic bases. The research is conducted in three segments. First by defining the problem and limitation, then conducting extensive literature and related work review to scrutinize the domain experts and ontology engineers roles based on some criteria in their scope of an intersection. After that, investigate existing approaches and tools to construct the domain knowledge life cycle toolchain. Finally, the study concludes the essential presence of both the domain knowledge, presented by the domain expert, and the formalization of semantic conceptualization, presented by the ontology engineer, in the domain engineering approach.
- Conference Article
16
- 10.5555/1182635.1164186
- May 5, 2011
The contents of Web databases are accessed through queries formulated on complex user interfaces. In many domains of interest (e.g. Auto) users are interested in obtaining information from alternative sources. Thus, they have to access many individual Web databases via query interfaces. We aim to construct automatically a well-designed query interface that integrates a set of interfaces in the same domain. This will permit users to access information uniformly from multiple sources. Earlier research in this area includes matching attributes across multiple query interfaces in the same domain and grouping related attributes. In this paper, we investigate the naming of the attributes in the integrated query interface. We provide a set of properties which are required in order to have consistent labels for the attributes within an integrated interface so that users have no difficulty in understanding it. Based on these properties, we design algorithms to systematically label the attributes. Experimental results on seven domains validate our theoretical study. In the process of naming attributes, a set of logical inference rules among the textual labels is discovered. These inferences are also likely to be applicable to other integration problems sensitive to naming: e.g., HTML forms, HTML tables or concept hierarchies in the semantic Web.
- Research Article
- 10.6100/ir735354
- Nov 18, 2015
Marangoni flows induced by non-uniform surfactant distributions
- Research Article
1
- 10.1609/icaps.v33i1.27187
- Jul 1, 2023
- Proceedings of the International Conference on Automated Planning and Scheduling
The Multi-Objective Multi-Agent Path Finding (MO-MAPF) problem is the problem of finding the Pareto-optimal frontier of collision-free paths for a team of agents while minimizing multiple cost metrics. Examples of such cost metrics include arrival times, travel distances, and energy consumption. In this paper, we focus on the Multi-Objective Conflict-Based Search (MO-CBS) algorithm, a state-of-the-art MO-MAPF algorithm. We show that the standard splitting strategy used by MO-CBS can lead to duplicate search nodes and hence can duplicate the search effort of MO-CBS. To address this issue, we propose two new splitting strategies for MO-CBS, namely cost splitting and disjoint cost splitting. Our theoretical results show that, when using either splitting strategy, MO-CBS maintains its completeness and optimality guarantees. Our experimental results show that disjoint cost splitting, our best splitting strategy, speeds up MO-CBS by up to two orders of magnitude and substantially improves its success rates in various settings.
- Conference Article
4
- 10.1145/2882903.2882927
- Jun 26, 2016
Query plans are compared according to multiple cost metrics in multi-objective query optimization. The goal is to find the set of Pareto plans realizing optimal cost tradeoffs for a given query. So far, only algorithms with exponential complexity in the number of query tables have been proposed for multi-objective query optimization. In this work, we present the first algorithm with polynomial complexity in the query size. Our algorithm is randomized and iterative. It improves query plans via a multi-objective version of hill climbing that applies multiple transformations in each climbing step for maximal efficiency. Based on a locally optimal plan, we approximate the Pareto plan set within the restricted space of plans with similar join orders. We maintain a cache of Pareto-optimal plans for each potentially useful intermediate result to share partial plans that were discovered in different iterations. We show that each iteration of our algorithm performs in expected polynomial time based on an analysis of the expected path length between a random plan and local optima reached by hill climbing. We experimentally show that our algorithm can optimize queries with hundreds of tables and outperforms other randomized algorithms such as the NSGA-II genetic algorithm over a wide range of scenarios.
- Research Article
- 10.22037/tpps.v1i4.18019
- Feb 5, 2018
The targeting of drugs to block protein-protein interactions (PPIs) has attracted great interest over recent years. Such targets, however, have been held to be difficult to inhibit using low molecular weight compounds, and as a consequence they are often branded as “undruggable”. This is partly because the interfaces involved are seen to be large, and the fact that they are generally regarded as being too smooth and too flat. In the work reported here, a series of quantitative systematic studies have been performed to determine the molecular area, roughness, curvature, and amino acid composition of the interfacial surfaces of PPIs, to determine the feasibility of designing small molecule drugs to inhibit these interactions. The X-ray crystal structures are analysed for a set of 48 PPIs involving G-protein, membrane receptor extracellular domain, and enzyme-inhibitor complexes. The protein partners involved in these PPIs are shown to have much larger interfacial areas than those for protein-small molecule complexes (≥ 900 A2 vs ~250 A2, respectively), and they have interfaces that are fairly smooth (with fractal dimensions close to 2) and quite flat (with mean surface curvatures in the order of ± 0.1 A-1). The mean interfacial surface curvatures of the PPI protein partners, however, are seen to change upon complexation, some very significantly so. Despite the fact that the amino acid compositions of the PPI interface surfaces are found to be significantly different from that of the average protein surface (with variations according to the type of PPI), it is concluded that the prospects for designing low molecular weight PPI inhibitors that act in an orthosteric manner remain rather limited. HIGHLIGHTS •Mean interfacial surface curvatures have been determined for protein-protein interaction (PPI) partners in their complexed and uncomplexed states. •Mean interfacial surface roughnesses have been determined for protein-protein interaction (PPI) partners in their complexed and uncomplexed states. •Amino acid compositions have been determined for PPI interface surfaces and these compared with that for the average protein surface. •Quantification of the PPI interfacial surface properties is used to assess the druggability of these targets.
- Research Article
- 10.18453/rosdok_id00002178
- Jan 1, 2009
The need for tailor-made data management is apparent in database research. There is an impressive body of research on extensible or customizable DBMS like kernel systems or component DBMS. Nevertheless, all these approaches have drawbacks, e.g., regarding customizability or performance, and there is no general solution for efficient development of tailorable DBMS. Novel software engineering techniques (e.g, software product lines and feature-oriented programming), can help in developing customizable DBMS and reduce the complexity of building variable solutions as we could show for embedded systems [2]. Applying such an approach to other domains like stream processing, column stores, XML databases, etc. is promising to create DBMS that provide only needed functionality, can be better tuned and maintained, and provide reuse within a domain and even between different domains. Customizability is also required on the level of the query language, i.e., SQL, which grows with every new standard. Independent of the actual task, which may be as simple as querying a single value, developers are confronted with the full arsenal of SQL. Furthermore, applications only use a small subset of the standard and require special extensions, e.g., for sensor networks or stream processing. This results in SQL dialects that are used in different domains and are decoupled from the SQL standard. We could show that SQL (i.e., the grammar and the parser) can be decomposed with a feature-oriented approach. This results in a family of SQL dialects from which tailor-made SQL parsers, e.g., for sensor networks, can be generated [1]. By combining tailor-made SQL dialects and customizable DBMS we think that one can build DBMS that can be fully tailored to an application domain or a special use case within a domain while providing high reuse. Creating such fully customizable systems is a highly challenging task. It involves customizability on all levels of a DBMS such as the query processor and optimizer, the transaction subsystem, or the storage system. Because of interdependencies between those subsystems an architecture has to be developed that is able to handle the resulting complexity. For example, changing the query language affects the whole DBMS including the query optimizer which is highly connected with all other parts of a DBMS. New extensibility mechanisms as provided by feature-oriented programming can help in building such variable systems. However, this is probably not sufficient and the community has to investigate in an architecture and new mechanisms for handling the variability that keep maintainability and performance of DBMS.
- Book Chapter
3
- 10.1007/978-3-540-71500-9_14
- Jan 1, 2007
Active and programmable network technologies strive to support completely new forms of data-path processing capabilities inside the network. This in conjunction with the ability to dynamically deploy such active services at strategic locations inside the network enables totally new types of applications. In this paper we exploit these network-side programming capabilities to realise a new active network application that dynamically evaluates network link costs based on in-line traffic measurements. The performance experienced by the data packets (e.g. delays, jitter and packet loss) along network or virtual links is used to compute link costs based on multiple cost metrics. The results are published by means of a routing metric broker, which enables available routing protocols to calculate different sets of routes for different QoS metrics - as for example suggested for ToS-based routing (RFC 1583).
- Research Article
5
- 10.1184/r1/6602801.v1
- Oct 17, 1983
Natural language communication with computers has long been a major goal of Artificial Intelligence both for what it can tell us about intelligence in general and for its practical utility - data bases, software packages, and Al-based expert systems all require flexible interfaces to a growing community of users who are not able or do not wish to communicate with computers in formal, artificial command languages. Whereas many of the fundamental problems of general natural language processing (NLP) by machine remain to be solved, the area has matured in recent years to the point where practical natural language interfaces to software systems can be constructed in many restricted, but nevertheless useful, circumstances. This tutorial is intended to survey the current state of applied natural language processing by presenting computationally effective NLP techniques, by discussing the range of capabilities these techniques provide for NLP systems, an by discussing their current limitations. Following the introduction, this document is divided into two major sections: the first on language recognition strategies at the single sentence level, and the second on language processing issues that arise during interactive dialogues. In both cases, we concentrate on those aspects of the problem appropriate for interactive natural languagemore » interfaces, but relate the techniques and systems discussed to more general work on natural language, independent of application domain.« less
- Book Chapter
13
- 10.1007/3-540-58907-4_5
- Jan 1, 1995
A multidatabase system (MDBS) is a database system which integrates pre-existing databases, called component local database systems (LDBSs), to support global applications accessing data at more than one LDBS. An important research issue in MDBS is query optimization. The query optimization problem in MDBS is quite different from the case of distributed database system (DDBS) since, due to schema heterogeneity and local autonomy of component LDBSs, is not possible to assume that the query optimizer has a complete information on the execution cost and database statistics. In this paper we present a distributed query optimization algorithm that works under very general assumptions for MDBSs with relational global data model. The algorithm is based on the idea of delegating the evaluation of the execution cost of the elementary steps in a query execution plan to the LDBS where the computation is performed. The optimization process is organized as a sequence of steps, in which at each step all LDBSs work in parallel to evaluate the cost of execution plans for partial queries of increasing size, and send their cost estimates to the other LDBS that need them for the next step. The computation is totally distributed, and organized in order to perform no duplicate computation, and to discard as soon as possible the execution plans that may not lead to an optimal solution.KeywordsExecution ModelExecution PlanQuery OptimizationQuery ExecutionExecution CostThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.