Query Logs Research Articles

Time is an important dimension of the information retrieval area that can be very useful in helping to meet the users information needs whenever they include temporal intents. However retrieving the information that meets the query demands is not an easy process. The ambiguity of the query is traditionally one of the causes impeding the retrieval of relevant information. This is particularly evident in the case of temporal queries where users tend to be subjective when expressing their intents (e.g., avatar movie instead of avatar movie 2009). Determining the possible times of the query is therefore of the utmost importance when attempting to achieve better disambiguated results and in order to enable new forms of exploring them. In this thesis, we present our contributions to disambiguate implicit temporal queries in realworld environment, i.e. the Web. To understand better this type of queries, three directions may be followed: information extracted from (1) metadata, (2) query logs or (3) document contents. Within the context of this thesis, we will focus on the latter. However, unlike existing approaches we do not resort to a classification methodology. Instead, in our approach, we seek to detect relevant temporal expressions based on corpus statistics and a general similarity measure that makes use of co-occurrences of words and years extracted from the contents of the documents. Moreover, our methodology tends to be mostly language-independent as we do not use any linguistic-based techniques. Instead, we use a rule-based model solution supported by regular expressions. Based on this, we start by performing a comprehensive study of the temporal value of web documents, particularly web snippets, showing that this type of collection is a valuable data source in the process of dating implicit temporal queries. We then develop two methods. A temporal similarity measure to evaluate the correlation between the query and the candidate dates identified, called Generic Temporal Evaluation (GTE) and a threshold-based classifier that selects the most relevant dates while filtering out the non-relevant or incorrect ones, known as GTEClass. Subsequently, we propose two different applications named GTE-Cluster and GTE-Rank. The first one, uses the determined time of the queries to improve search results exploration. For this purpose, we propose a flat temporal clustering model solution where documents are grouped at the year level. GTE-Rank, in turn, uses the same information to temporally re-rank the web search results. We employ a combination approach that considers words and temporal scores, where documents are ranked to reflect the relevance of the snippet for the query, both in the

Read full abstract

BackgroundThe Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool.MethodsIn addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run.ResultsAccording to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction.ConclusionsDespite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records.

Read full abstract

Query Logs Research Articles

Related Topics

Articles published on Query Logs

Query-Log Aware Replicated Declustering

Utility preserving query log anonymization via semantic microaggregation

Learning a hybrid similarity measure for image retrieval

Intent mining in search query logs for automatic search script generation

A New Algorithm for Inferring User Search Goals with Feedback Sessions

Mining subtopics from text fragments for a web query

Disambiguating implicit temporal queries for temporal information retrieval applications

A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log

Finding information in books: Characteristics of full‐text searches in a collection of 10 million books

Mining subtopics from different aspects for diversifying search results

Pragmatic correlation analysis for probabilistic ranking over relational data

Learning to rank query suggestions for adhoc and diversity search

ANEEC: A Quasi-Automatic System for Massive Named Entity Extraction and Categorization

One Size Does Not Fit All: Toward User- and Query-Dependent Ranking for Web Databases

Matching health information seekers' queries to medical terms.

Entity Synonyms for Structured Web Search

Personalized ranking in web databases: establishing and utilizing an appropriate workload

Query Recommendation for Optimizing the Search Engine Results

Optimizing Search Engine Result using Intelligent Model

Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Query Logs Research Articles

Related Topics

Articles published on Query Logs

Query-Log Aware Replicated Declustering

Utility preserving query log anonymization via semantic microaggregation

Learning a hybrid similarity measure for image retrieval

Intent mining in search query logs for automatic search script generation

A New Algorithm for Inferring User Search Goals with Feedback Sessions

Mining subtopics from text fragments for a web query

Disambiguating implicit temporal queries for temporal information retrieval applications

A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log

Finding information in books: Characteristics of full‐text searches in a collection of 10 million books

Mining subtopics from different aspects for diversifying search results

Pragmatic correlation analysis for probabilistic ranking over relational data

Learning to rank query suggestions for adhoc and diversity search

ANEEC: A Quasi-Automatic System for Massive Named Entity Extraction and Categorization

One Size Does Not Fit All: Toward User- and Query-Dependent Ranking for Web Databases

Matching health information seekers' queries to medical terms.

Entity Synonyms for Structured Web Search

Personalized ranking in web databases: establishing and utilizing an appropriate workload

Query Recommendation for Optimizing the Search Engine Results

Optimizing Search Engine Result using Intelligent Model

Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines