Search Engine Query Logs Research Articles

BackgroundDigital epidemiology tries to identify diseases dynamics and spread behaviors using digital traces collected via search engines logs and social media posts. However, the impacts of news on information-seeking behaviors have been remained unknown. MethodsData employed in this research provided from two sources, (1) Parsijoo search engine query logs of 48 months, and (2) a set of documents of 28 months of Parsijoo’s news service. Two classes of topics, i.e. macro-topics and micro-topics were selected to be tracked in query logs and news. Keywords of the macro-topics were automatically generated using web provided resources and exceeded 10k. Keyword set of micro-topics were limited to a numerable list including terms related to diseases and health-related activities. The tests are established in the form of three studies. Study A includes temporal analyses of 7 macro-topics in query logs. Study B considers analyzing seasonality of searching patterns of 9 micro-topics, and Study C assesses the impact of news media coverage on users’ health-related information-seeking behaviors. ResultsStudy A showed that the hourly distribution of various macro-topics followed the changes in social activity level. Conversely, the interestingness of macro-topics did not follow the regulation of topic distributions. Among macro-topics, “Pharmacotherapy” has highest interestingness level and wider time-window of popularity. In Study B, seasonality of a limited number of diseases and health-related activities were analyzed. Trends of infectious diseases, such as flu, mumps and chicken pox were seasonal. Due to seasonality of most of diseases covered in national vaccination plans, the trend belonging to “Immunization and Vaccination” was seasonal, as well. Cancer awareness events caused peaks in search trends of “Cancer” and “Screening” micro-topics in specific days of each year that mimic repeated patterns which may mistakenly be identified as seasonality. In study C, we assessed the co-integration and correlation between news and query trends. Our results demonstrated that micro-topics sparsely covered in news media had lowest level of impressiveness and, subsequently, the lowest impact on users’ intents. ConclusionOur results can reveal public reaction to social events, diseases and prevention procedures. Furthermore, we found that news trends are co-integrated with search queries and are able to reveal health-related events; however, they cannot be used interchangeably. It is recommended that the user-generated contents and news documents are analyzed mutually and interactively.

Read full abstract

It is well established that extracting and annotating occurrences of entities in a collection of unstructured text documents with their concepts improves the effectiveness of answering queries over the collection. However, it is very resource intensive to create and maintain large annotated collections. Since the available resources of an enterprise are limited and/or its users may have urgent information needs, it may have to select only a subset of relevant concepts for extraction and annotation. We call this subset a conceptual design for the annotated collection. In this article, we introduce and formally define the problem of cost-effective conceptual design where, given a collection, a set of relevant concepts, and a fixed budget, one likes to find a conceptual design that most improves the effectiveness of answering queries over the collection. We provide efficient algorithms for special cases of the problem and prove it is generally NP-hard in the number of relevant concepts. We propose three efficient approximations to solve the problem: a greedy algorithm, an approximate popularity maximization (APM for short), and approximate annotation-benefit maximization (AAM for short). We show that, if there are no constraints regrading the overlap of concepts, APM is a fully polynomial time approximation scheme. We also prove that if the relevant concepts are mutually exclusive, the greedy algorithm delivers a constant approximation ratio if the concepts are equally costly, APM has a constant approximation ratio, and AAM is a fully polynomial-time approximation scheme. Our empirical results using a Wikipedia collection and a search engine query log validate the proposed formalization of the problem and show that APM and AAM efficiently compute conceptual designs. They also indicate that, in general, APM delivers the optimal conceptual designs if the relevant concepts are not mutually exclusive. Also, if the relevant concepts are mutually exclusive, the conceptual designs delivered by AAM improve the effectiveness of answering queries over the collection more than the solutions provided by APM.

Read full abstract

Search Engine Query Logs Research Articles

Related Topics

Articles published on Search Engine Query Logs

Mining Domain Terminologies Using Search Engine's Query Log

Automatic prediction of news intent for search queries

The Effect of Big Data on Recommendation Quality. The Example of Internet Search

Search engines, news wires and digital epidemiology: Presumptions and facts

The Effect of Big Data on Recommendation Quality: The Example of Internet Search

Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China

Data Mining Framework for Web Search Personalization

A machine learning approach for result caching in web search engines

Cross-Lingual Topic Discovery From Multilingual Search Engine Query Log

Inferring User Search Goals with feedback Sessions using Fuzzy K-Means Algorithm

New query suggestion framework and algorithms: A case study for an educational search engine

Query intent inference via search engine log

Query Topic Classification and Sociology of Web Query Logs

Query suggestion with diversification and personalization

Evaluation of Reranked Recommended Queries in Web Information Retrieval using NDCG and CV

Cost-Effective Conceptual Design for Information Extraction

A New Algorithm for Inferring User Search Goals with Feedback Sessions

Query ranking model for search engine query recommendation

QUERY RECOMMENDATIONS AND ITS EVALUATION IN WEB INFORMATION RETRIEVAL

SG-WSTD: A framework for scalable geographic web search topic discovery

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Search Engine Query Logs Research Articles

Related Topics

Articles published on Search Engine Query Logs

Mining Domain Terminologies Using Search Engine's Query Log

Automatic prediction of news intent for search queries

The Effect of Big Data on Recommendation Quality. The Example of Internet Search

Search engines, news wires and digital epidemiology: Presumptions and facts

The Effect of Big Data on Recommendation Quality: The Example of Internet Search

Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China

Data Mining Framework for Web Search Personalization

A machine learning approach for result caching in web search engines

Cross-Lingual Topic Discovery From Multilingual Search Engine Query Log

Inferring User Search Goals with feedback Sessions using Fuzzy K-Means Algorithm

New query suggestion framework and algorithms: A case study for an educational search engine

Query intent inference via search engine log

Query Topic Classification and Sociology of Web Query Logs

Query suggestion with diversification and personalization

Evaluation of Reranked Recommended Queries in Web Information Retrieval using NDCG and CV

Cost-Effective Conceptual Design for Information Extraction

A New Algorithm for Inferring User Search Goals with Feedback Sessions

Query ranking model for search engine query recommendation

QUERY RECOMMENDATIONS AND ITS EVALUATION IN WEB INFORMATION RETRIEVAL

SG-WSTD: A framework for scalable geographic web search topic discovery