Quality Of Queries Research Articles

Technical Q&A sites (e.g., Stack Overflow (SO)) are important resources for developers to search for knowledge about technical problems. Search engines provided in Q&A sites and information retrieval approaches (e.g., word embedding-based) have limited capabilities to retrieve relevant questions when queries are imprecisely specified, such as missing important technical details (e.g., the user’s preferred programming languages). Although many automatic query expansion approaches have been proposed to improve the quality of queries by expanding queries with relevant terms, the information missed in a query is not identified. Moreover, without user involvement, the existing query expansion approaches may introduce unexpected terms and lead to undesired results. In this paper, we propose an interactive query refinement approach for question retrieval, named <i>Chatbot4QR</i>, which can assist users in recognizing and clarifying technical details missed in queries and thus retrieve more relevant questions for users. Chatbot4QR automatically detects missing technical details in a query and generates several clarification questions (CQs) to interact with the user to capture their overlooked technical details. To ensure the accuracy of CQs, we design a heuristic-based approach for CQ generation after building two kinds of technical knowledge bases: a manually categorized result of 1,841 technical tags in SO and the multiple version-frequency information of the tags. We develop a Chatbot4QR prototype that uses 1.88 million SO questions as the repository for question retrieval. To evaluate Chatbot4QR, we conduct six user studies with 25 participants on 50 experimental queries. The results are as follows. (1) On average 60.8 percent of the CQs generated for a query are useful for helping the participants recognize missing technical details. (2) Chatbot4QR can rapidly respond to the participants after receiving a query within approximately 1.3 seconds. (3) The refined queries contribute to retrieving more relevant SO questions than nine baseline approaches. For more than 70 percent of the participants who have preferred techniques on the query tasks, Chatbot4QR significantly outperforms the state-of-the-art word embedding-based retrieval approach with an improvement of at least 54.6 percent in terms of two measurements: Pre<inline-formula><tex-math notation="LaTeX">$@$</tex-math><alternatives><mml:math><mml:mo>@</mml:mo></mml:math><inline-graphic xlink:href="xia-ieq1-3016006.gif"/></alternatives></inline-formula>k and NDCG<inline-formula><tex-math notation="LaTeX">$@$</tex-math><alternatives><mml:math><mml:mo>@</mml:mo></mml:math><inline-graphic xlink:href="xia-ieq2-3016006.gif"/></alternatives></inline-formula>k. (4) For 48-88 percent of the assigned query tasks, the participants obtain more desired results after interacting with Chatbot4QR than directly searching from Web search engines (e.g., the SO search engine and Google) using the original queries.

Read full abstract

Context: Since the mid-2000s, numerous recommendation systems based on text retrieval (TR) have been proposed to support software engineering (SE) tasks such as concept location, traceability link recovery, code reuse, impact analysis, and so on. The success of TR-based solutions highly depends on the query submitted, which is either formulated by the developer or automatically extracted from software artifacts. Aim: We aim at predicting the quality of queries submitted to TR-based approaches in SE. This can lead to benefits for developers and for the quality of software systems alike. For example, knowing when a query is poorly formulated can save developers the time and frustration of analyzing irrelevant search results. Instead, they could focus on reformulating the query. Also, knowing if an artifact used as a query leads to irrelevant search results may uncover underlying problems in the query artifact itself. Method: We introduce an automatic query quality prediction approach for software artifact retrieval by adapting NL-inspired solutions to their use on software data. We present two applications and evaluations of the approach in the context of concept location and traceability link recovery, where TR has been applied most often in SE. For concept location, we use the approach to determine if the list of retrieved code elements is likely to contain code relevant to a particular change request or not, in which case, the queries are good candidates for reformulation. For traceability link recovery, the queries represent software artifacts. In this case, we use the query quality prediction approach to identify artifacts that are hard to trace to other artifacts and may therefore have a low intrinsic quality for TR-based traceability link recovery. Results: For concept location, the evaluation shows that our approach is able to correctly predict the quality of queries in 82% of the cases, on average, using very little training data. In the case of traceability recovery, the proposed approach is able to detect hard to trace artifacts in 74% of the cases, on average. Conclusions: The results of our evaluation on applications for concept location and traceability link recovery indicate that our approach can be used to predict the results of a TR-based approach by assessing the quality of the text query. This can lead to saved effort and time, as well as the identification of software artifacts that may be difficult to trace using TR.

Read full abstract

Quality Of Queries Research Articles

Related Topics

Articles published on Quality Of Queries

Soft prompt tuning for augmenting dense retrieval with large language models

Machine learning classifiers to predict the quality of semantic web queries

Integrating a new knowledge organisation system for monoclonal antibodies for therapeutic use authorised in Europe into HeTOP terminology-ontology server

Ontology-Based Semantic Search Framework for Disparate Datasets

Chatbot4QR: Interactive Query Refinement for Technical Question Retrieval

Auditing the Information Quality of News-Related Queries on the Alexa Voice Assistant

Corpus processing service: A Knowledge Graph platform to perform deep data exploration on corpora

Towards efficient top-k fuzzy auto-completion queries

ManQ: Many-objective optimization-based automatic query reduction for IR-based bug localization

Answering Why-Not Group Spatial Keyword Queries

Improving Query Quality for Transductive Learning in Learning to Rank

Deep learning the semantics of change sequences for query expansion

Predicting Query Quality for Applications of Text Retrieval to Software Engineering Tasks

Efficient Processing of Relevant Nearest-Neighbor Queries

IJARCCE - Computer and Communication Engineering

Supporting Search-As-You-Type Using SQL in Databases

A new spatio-temporal prediction approach based on aggregate queries

An Efficient Concept-based Mining Model for Deriving User Profiles

On query processing in wireless sensor networks using classes of quality of queries

High-performance processing of text queries with tunable pruned term and term pair indexes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Quality Of Queries Research Articles

Related Topics

Articles published on Quality Of Queries

Soft prompt tuning for augmenting dense retrieval with large language models

Machine learning classifiers to predict the quality of semantic web queries

Integrating a new knowledge organisation system for monoclonal antibodies for therapeutic use authorised in Europe into HeTOP terminology-ontology server

Ontology-Based Semantic Search Framework for Disparate Datasets

Chatbot4QR: Interactive Query Refinement for Technical Question Retrieval

Auditing the Information Quality of News-Related Queries on the Alexa Voice Assistant

Corpus processing service: A Knowledge Graph platform to perform deep data exploration on corpora

Towards efficient top-k fuzzy auto-completion queries

ManQ: Many-objective optimization-based automatic query reduction for IR-based bug localization

Answering Why-Not Group Spatial Keyword Queries

Improving Query Quality for Transductive Learning in Learning to Rank

Deep learning the semantics of change sequences for query expansion

Predicting Query Quality for Applications of Text Retrieval to Software Engineering Tasks

Efficient Processing of Relevant Nearest-Neighbor Queries

IJARCCE - Computer and Communication Engineering

Supporting Search-As-You-Type Using SQL in Databases

A new spatio-temporal prediction approach based on aggregate queries

An Efficient Concept-based Mining Model for Deriving User Profiles

On query processing in wireless sensor networks using classes of quality of queries

High-performance processing of text queries with tunable pruned term and term pair indexes