What Makes a Good Query?

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Searching for information in a goal-directed manner is central for learning, diagnosis, and prediction. Children ask questions to learn new concepts, doctors conduct medical tests to diagnose their patients, and scientists perform experiments to test their theories. But what makes a good query? What principles govern human information acquisition and how do people decide which query to conduct to achieve their goals? What challenges need to be met to advance the theory and psychology of human inquiry? Addressing these issues, we introduce the conceptual and mathematical ideas underlying different models of the value of information, what purpose these models serve in psychological research, and how they can be integrated in a unified computational framework. We also discuss the conflict between short- and long-term efficiency of prominent methods for query selection, and the resulting normative and methodological implications for studying human sequential search. A final point of discussion concerns the relations between probabilistic (Bayesian) models of the value of information and heuristic search strategies, and the insights that can be gained from bridging different levels of analysis and types of models. We conclude by discussing open questions and challenges that research needs to address to build a comprehensive theory of human information acquisition.

Similar Papers
  • Book Chapter
  • 10.1007/978-981-10-3322-3_12
Generating Distributed Query Plans Using Modified Cuckoo Search Algorithm
  • Jan 1, 2017
  • T V Vijay Kumar + 1 more

In distributed databases, data is replicated and fragmented across multiple disparate sites spread across a computer network. Consequently, there can exist large numbers of possible query plans for a distributed query. This number increases with increase in the number of sites containing the replicated data. For large numbers of sites, computing an efficient query processing plan becomes a computationally expensive task. This necessitates the devising of a distributed query processing strategy capable of generating good quality query plans, from amongst all possible query plans, which minimize the total cost of processing a distributed query. This distributed query plan generation (DQPG) problem, being a combinatorial optimization problem, has been addressed in this paper using the modified cuckoo search algorithm. Accordingly, a modified CSA (mCSA) based DQPG algorithm (DQPG mCSA ), which aims to generate good quality Top-K query plans for a given distributed query, has been proposed herein. Experimental based comparison of DQPG mCSA with the existing GA based DQPG algorithm (DQPG GA ) shows that the former is able to generate comparatively better quality Top-K query plans, which, in turn, would result in a reduction in the query response time and thereby enabling efficient decision making.

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.artint.2020.103328
On the equivalence of optimal recommendation sets and myopically optimal query sets
  • May 26, 2020
  • Artificial Intelligence
  • Paolo Viappiani + 1 more

On the equivalence of optimal recommendation sets and myopically optimal query sets

  • Research Article
  • Cite Count Icon 13
  • 10.5121/ijcsit.2013.5506
Improving the Effectiveness of Information Retrieval System Using Adaptive Genetic Algorithm
  • Oct 31, 2013
  • International Journal of Computer Science and Information Technology
  • Wafa Maitah + 2 more

Traditional Genetic Algorithm which is used in previous studies depends on fixed control parameters especially crossover and mutation probabilities, but in this research we tried to use adaptive genetic algorithm. Genetic algorithm started to be applied in information retrieval system in order to optimize the query by genetic algorithm, a good query is a set of terms that express accurately the information need while being usable within collection corpus, the last part of this specification is critical for the matching process to be efficient, that is why most research efforts are actually put toward the query improvement. We investigated the use of adaptive genetic algorithm (AGA) under vector space model, Extended Boolean model, and Language model in information retrieval (IR), the algorithm used crossover and mutation operators with variable probability, where a traditional genetic algorithm (GA) uses fixed values of those, and remain unchanged during execution. GA is developed to support adaptive adjustment of mutation and crossover probability; this allows faster attainment of better solutions. The paper has been tested using 242 Arabic abstracts collected from the proceedings of the Saudi Arabian National conference.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/geoinformatics.2009.5292970
Integration of grid and OGC compliant servcies to implement the sharing and interoperability of multi-source and heterogeneous geospatial data
  • Aug 1, 2009
  • Hong Fan + 2 more

this paper integrates grid technology and OGC compliant services to design and implemented a geospatial web service platform for the western region's surveying and mapping project. It can make use of OGC common services and interoperability interfaces to supply efficient and secure data management, publishing and inquiries in the heterogeneous computer environment and as well as multi-scale seamless display in 3D GeoGlobe. As an integration of grid technology and the OGC Web Services, this platform has many advantages that traditional platform can't provide, including open, standardization, security and the ability to integrate heterogeneous services and support for heterogeneous data types, and having a good query and visualization effectiveness as well.

  • Conference Article
  • Cite Count Icon 17
  • 10.1145/3331184.3331243
Why do Users Issue Good Queries?
  • Jul 18, 2019
  • Lauri Kangassalo + 3 more

Despite advances in the past few decades in studying what kind of queries users input to search engines and how to suggest queries for the users, the fundamental question of what makes human cognition able to estimate goodness of query terms is largely unanswered. For example, a person searching information about cats'' is able to choose query terms, such as housecat'', feline'', or animal'' and avoid terms like similar'', variety'', and distinguish''. We investigated the association between the specificity of terms occurring in documents and human brain activity measured via electroencephalography (EEG). We analyzed the brain activity data of fifteen participants, recorded in response to reading terms from Wikipedia documents. Term specificity was shown to be associated with the amplitude of evoked brain responses. The results indicate that by being able to determine which terms carry maximal information about, and can best discriminate between, documents, people have the capability to enter good query terms. Moreover, our results suggest that the effective query term selection process, often observed in practical search behavior studies, has a neural basis. We believe our findings constitute an important step in revealing the cognitive processing behind query formulation and evaluating informativeness of language in general.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-540-73257-0_10
Computing Social Networks for Information Sharing: A Case-Based Approach
  • Jan 1, 2007
  • Rushed Kanawati + 1 more

In this paper we describe a peer-to-peer approach that ails at allowing a group of like-minded people to share relevant documents in an implicit way. We suppose that user save their documents in a local user-defined hierarchy. the association between documents and hierarchy nodes (or folders) is used by a supervised hybrid neural-CBR classifier in order to learn the user classification strategy. This strategy is then used to compute correlations between local folders and remote ones allowing to recommend documents without having a shared hierarchy. Another CBR system is used to memorize how good queries are answered by peer agents allowing to learn a dynamic community of peer agents to be associated with each local folder.

  • Book Chapter
  • Cite Count Icon 190
  • 10.1007/3-540-48762-x_61
Content-Based Image Retrieval Based on Local Affinely Invariant Regions
  • Jan 1, 1999
  • Tinne Tuytelaars + 1 more

This contribution develops a new technique for content-based image retrieval. Where most existing image retrieval systems mainly focus on color and color distribution or texture, we classify the images based on local invariants. These features represent the image in a very compact way and allow fast comparison and feature matching with images in the database. Using local features makes the system robust to occlusions and changes in the background. Using invariants makes it robust to changes in viewpoint and illumination.Here, “similarity” is given a more narrow interpretation than usual in the database retrieval literature, with two images being similar if they represent the same object or scene. Finding such additional images is the subject of quite a few queries.To be able to deal with large changes in viewpoint, a method to automatically extract local, affinely invariant regions has been developed. As shown by the first experimental results on a database of 100 images, this results in an overall system with very good query results.KeywordsImage RetrievalQuery ImageInvariant RegionStraight EdgeMoment InvariantThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

  • Conference Article
  • 10.1145/2554850.2555005
Recommend-Me
  • Mar 24, 2014
  • Thanh Duc Ngo + 3 more

In typical image retrieval systems, to search for an object, users must specify a region bounding the object in an input image. There are situations that the queried region does not have any match with regions in images of the retrieved database. Finding a region in the input image to form a good query, which certainly returns relevant results, is a tedious task because users need to try all possible query regions without prior knowledge about what objects are really existed in the database. This paper presents a novel recommendation system, named Recommend-Me, which automatically recommends good query regions to users. To realize good query regions, their matches in the database must be found. A greedy solution based on evaluating all possible region pairs, given a pair is formed by one candidate region in the input image and one region in an image of the database, is infeasible. To avoid that, we propose a two-stage approach to significantly reduce the search space and the number of similarity evaluations. Specifically, we first use inverted index technique to quickly filter out a large number of images having insufficient similarities with the input image. We then propose and apply a novel branch-and-bound based algorithm to efficiently identify region pairs with highest scores. We demonstrate the scalability and performance of our system on two public datasets of over 100K and 1 million images.

  • Research Article
  • Cite Count Icon 3
  • 10.1097/00124784-200603000-00006
Utahʼs IBIS-PH
  • Mar 1, 2006
  • Journal of Public Health Management and Practice
  • Garth Braithwaite + 1 more

The ability to make good Web-based data query system (WDQS) project management decisions requires an understanding of the trade-offs inherent among various technology options. This article presents the current options available for the user interface and data display and compares the advantages and disadvantages of each for use in a WDQS. Relevant options are also discussed for back-end technologies such as Web and application servers and data storage mechanisms. We explain our decisions in developing the Indicator-based Information System for Public Health (IBIS-PH) query system to increase the probability of success and minimize risk. Finally, we compare the resulting IBIS-PH application characteristics with our original design requirements: broad public access; rich, interactive, and easy-to-use interface; portability; accessibility; maintainability of software and interface; supportability; low cost; and security.

  • Conference Article
  • 10.1145/100348.100442
Query processing and file management issues in partitioned databases (abstract)
  • Jan 1, 1990
  • Esen Ozkarahan + 1 more

This study reviews the database partitioning techniques and elaborates on features of storage organization from efficiency and query processing standpoints. Methods for static files have excellent utilization records but require variable number of disk accesses, are prone to overflows, and may need re-organization when changes are made. Dynamic file schemes with directories have good retrieval query performance but tend to achieve low storage utilization, suffer from growing directory, and may propagate the effects of an update to several regions and possibly all the way up to the highest level directory.Partitioning very large databases has become an intense research area over the last few years. The characteristics of partitioned file organizations have profound effects on the efficiency of query processing and database updates.We have identified two main partitioning methodologies: static and dynamic files. While with static files we can achieve good utilization, they result in variable disk accesses per query and need frequent file reorganizations due to updates. As with the dynamic files we can achieve constant number of disk accesses for all queries at the expense of relatively low utilization (for uniform as well as non-uniform data distributions), directory sizes may grow fast, and handling of updates efficiently in extreme cases may be difficult.To this date very few has been researched in the area of updates of partitioned file organizations. However, as can be seen in our discussions there are important problems awaiting to be tackled in this area especially with respect to directory structures.

  • Book Chapter
  • 10.1007/978-1-4302-6188-9_11
Automated SQL Tuning
  • Jan 1, 2013
  • Sam R Alapati + 2 more

Prior to Oracle Database 11g, accurately identifying poorly performing SQL queries and recommending solutions was mainly the purview of veteran SQL tuners. Typically one had to know how to identify high-resource SQL statements and bottlenecks, generate and interpret execution plans, extract data from the dynamic performance views, understand wait events and statistics, and then collate this knowledge to produce good SQL queries. As you’ll see in this chapter, the Oracle SQL tuning paradigm has shifted a bit.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/icde.2009.120
Exploring a Few Good Tuples from Text Databases
  • Mar 1, 2009
  • Proceedings - International Conference on Data Engineering
  • Alpa Jain + 1 more

Information extraction from text databases is a useful paradigm to populate relational tables and unlock the considerable value hidden in plain-text documents. However, information extraction can be expensive, due to various complex text processing steps necessary in uncovering the hidden data. There are a large number of text databases available, and not every text database is necessarily relevant to every relation. Hence, it is important to be able to quickly explore the utility of running an extractor for a specific relation over a given text database before carrying out the expensive extraction task. In this paper, we present a novel exploration methodology of finding a few good tuples for a relation that can be extracted from a database which allows for judging the relevance of the database for the relation. Specifically, we propose the notion of a good (k, lscr) query as one that can return any k tuples for a relation among the top-lscr fraction of tuples ranked by their aggregated confidence scores, provided by the extractor; if these tuples have high scores, the database can be determined as relevant to the relation. We formalize the access model for information extraction, and investigate efficient query processing algorithms for good (k, lscr) queries, which do not rely on any prior knowledge about the extraction task or the database. We demonstrate the viability of our algorithms using a detailed experimental study with real text databases.

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/ideas.1998.694365
Data compression in database systems
  • Jul 8, 1998
  • W.P Cockshott + 3 more

This paper addresses the question of how information-theoretically-derived compact representations can be applied in practice to improve storage and processing efficiency in DBMS. Compact data representation has the potential for savings in storage, access and processing costs throughout the systems architecture and may alter the balance of usage between disk and solid state storage. To realise the potential performance benefits, however, novel systems engineering must be adopted to ensure that compression/decompression overheads are limited. This paper describes a basic approach to storage and processing of relations in a highly compressed form. A vertical columnwise representation is adopted in which columns can dynamically vary incrementally in both length and width. To achieve good performance query processing is carried out directly on the compressed relational representation (using a compressed representation of the query), thus avoiding decompression overheads. Measurements of performance of the Hi-base prototype implementation are compared with those obtained from conventional DBMS.

  • Research Article
  • 10.1002/(sici)1520-684x(199708)28:9<1::aid-scj1>3.0.co;2-i
Index splitting for complex objects in parallel environments
  • Aug 1, 1997
  • Systems and Computers in Japan
  • Kazuhiro Ogura + 3 more

Many indexing techniques for complex objects have been developed, but most of them are intended for a single-machine environment. By dividing a large index into subindexes and placing each of them on a separate machine, we can get good efficiency of index operations through parallelism. In this paper, we propose an optimizing scheme for horizontal and vertical index splitting by considering parallel processing, assuming a wide variety of multi-processor environments. The optimizing method gives good retrieval query efficiency in the case where an attribute value of a nested object is specified, and it also improves retrieval throughput, that is, the average number of retrieval queries processed within a constant time. In our method, the multi-indexing scheme is used, in which index updating can be performed with minimal cost and index elements can be easily moved across machines.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-319-64283-3_12
Pre-processing and Indexing Techniques for Constellation Queries in Big Data
  • Jan 1, 2017
  • Amir Khatibi + 5 more

Geometric patterns are defined by a spatial distribution of a set of objects. They can be found in many spatial datasets as in seismic, astronomy, and transportation. A particular interesting geometric pattern is exhibited by the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, collectively refered to as constellation queries, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the query pattern. In this paper we propose algorithms to optimize the computation of constellation queries. Our techniques involve pre-processing the query to reduce its dimensionality as well as indexing the data to fasten stars neighboring computation using a PH-tree. We have implemented our techniques in Spark and evaluated our techniques by a series of experiments. The PH-tree indexing showed very good results and guarantees query answer completeness.

Save Icon
Up Arrow
Open/Close