Query Generation Research Articles

The article describes various ways to use generative pre-trained language models to build a corporate question-and-answer system. A significant limitation of the current generative pre-trained language models is the limit on the number of input tokens, which does not allow them to work "out of the box" with a large number of documents or with a large document. To overcome this limitation, the paper considers the indexing of documents with subsequent search query and response generation based on two of the most popular open source solutions at the moment – the Haystack and LlamaIndex frameworks. It has been shown that using the open source Haystack framework with the best settings allows you to get more accurate answers when building a corporate question-and-answer system compared to the open source LlamaIndex framework, however, requires the use of an average of several more tokens. The article used a comparative analysis to evaluate the effectiveness of using generative pre-trained language models in corporate question-and-answer systems using the Haystack and Llamaindex frameworks. The evaluation of the obtained results was carried out using the EM (exact match) metric. The main conclusions of the conducted research on the creation of question-answer systems using generative pre-trained language models are: 1. Using hierarchical indexing is currently extremely expensive in terms of the number of tokens used (about 160,000 tokens for hierarchical indexing versus 30,000 tokens on average for sequential indexing), since the response is generated by sequentially processing parent and child nodes. 2. Processing information using the Haystack framework with the best settings allows you to get somewhat more accurate answers than using the LlamaIndex framework (0.7 vs. 0.67 with the best settings). 3. Using the Haystack framework is more invariant with respect to the accuracy of responses in terms of the number of tokens in the chunk. 4. On average, using the Haystack framework is more expensive in terms of the number of tokens (about 4 times) than the LlamaIndex framework. 5. The "create and refine" and "tree summarize" response generation modes for the LlamaIndex framework are approximately the same in terms of the accuracy of the responses received, however, more tokens are required for the "tree summarize" mode.

People often create themed collections to make sense of an ever-increasing number of archived web pages. Some of these collections contain hundreds of thousands of documents. Thousands of collections exist, many covering the same topic. Few collections include standardized metadata. This scale makes understanding a collection an expensive proposition. Our Dark and Stormy Archives (DSA) five-process model implements a novel summarization method to help users understand a collection by combining web archives and social media storytelling. The five processes of the DSA model are: select exemplars, generate story metadata, generate document metadata, visualize the story, and distribute the story. Selecting exemplars produces a set of k documents from the N documents in the collection, where k < < N , thus reducing the number of documents visitors need to review to understand a collection. Generating story and document metadata selects images, titles, descriptions, and other content from these exemplars. Visualizing the story ties this metadata together in a format the visitor can consume. Without distributing the story, it is not shared for others to consume. We present a research study demonstrating that our algorithmic primitives can be combined to select relevant exemplars that are otherwise undiscoverable using a conventional search engine and query generation methods. Having demonstrated improved methods for selecting exemplars, we visualize the story. Previous work established that the social card is the best format for visitors to consume surrogates. The social card combines metadata fields, including the document’s title, a brief description, and a striking image. Social cards are commonly found on social media platforms. We discovered that these platforms perform poorly for mementos and rely on web page authors to supply the necessary values for these metadata fields. With web archives, we often encounter archived web pages that predate the existence of this metadata. To generate this missing metadata and ensure that storytelling is available for these documents, we apply machine learning to generate the images needed for social cards with a Precision@1 of 0.8314. We also provide the length values needed for executing automatic summarization algorithms to generate document descriptions. Applying these concepts helps us create the visualizations needed to fulfill the final processes of story generation. We close this work with examples and applications of this technology.

Query Generation Research Articles

Related Topics

Articles published on Query Generation

Aspects of creating a corporate question-and-answer system using generative pre-trained language models

Dynamically retrieving knowledge via query generation for informative dialogue generation

Investigating the Support Provided by Chatbots to Educational Institutions and Their Students: A Systematic Literature Review

Summarizing Web Archive Corpora via Social Media Storytelling by Automatically Selecting and Visualizing Exemplars

A Study of Consumer Adoption of Chatbot in E-Commerce Sector in India

Resilient digital twin modeling: A transferable approach

Querying Data Exchange Settings Beyond Positive Queries

Evaluating public interest in herpes zoster in Germany by leveraging the internet: a retrospective search data analysis

LeafAI: query generator for clinical cohort discovery rivaling a human programmer.

Domain-specific influence on Facebook: How topic matters when assessing influential accounts in four countries

A Demonstration of DLBD: Database Logic Bug Detection System

Lynx: A Graph Query Framework for Multiple Heterogeneous Data Sources

Visual Anomaly Detection via Partition Memory Bank Module and Error Estimation

Evaluation of Replies to Voice Queries in Gynecologic Oncology by Virtual Assistants Siri, Alexa, Google, and Cortana

Visual Question Generation Answering (VQG-VQA) using Machine Learning Models

An SQL query generator for cross-domain human language based questions based on NLP model

Transformative effects of ChatGPT on modern education: Emerging Era of AI Chatbots

Adaptive search query generation and refinement in systematic literature review

VLT: Vision-Language Transformer and Query Generation for Referring Segmentation.

Detecting Logic Bugs of Join Optimizations in DBMS

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Query Generation Research Articles

Related Topics

Articles published on Query Generation

Aspects of creating a corporate question-and-answer system using generative pre-trained language models

Dynamically retrieving knowledge via query generation for informative dialogue generation

Investigating the Support Provided by Chatbots to Educational Institutions and Their Students: A Systematic Literature Review

Summarizing Web Archive Corpora via Social Media Storytelling by Automatically Selecting and Visualizing Exemplars

A Study of Consumer Adoption of Chatbot in E-Commerce Sector in India

Resilient digital twin modeling: A transferable approach

Querying Data Exchange Settings Beyond Positive Queries

Evaluating public interest in herpes zoster in Germany by leveraging the internet: a retrospective search data analysis

LeafAI: query generator for clinical cohort discovery rivaling a human programmer.

Domain-specific influence on Facebook: How topic matters when assessing influential accounts in four countries

A Demonstration of DLBD: Database Logic Bug Detection System

Lynx: A Graph Query Framework for Multiple Heterogeneous Data Sources

Visual Anomaly Detection via Partition Memory Bank Module and Error Estimation

Evaluation of Replies to Voice Queries in Gynecologic Oncology by Virtual Assistants Siri, Alexa, Google, and Cortana

Visual Question Generation Answering (VQG-VQA) using Machine Learning Models

An SQL query generator for cross-domain human language based questions based on NLP model

Transformative effects of ChatGPT on modern education: Emerging Era of AI Chatbots

Adaptive search query generation and refinement in systematic literature review

VLT: Vision-Language Transformer and Query Generation for Referring Segmentation.

Detecting Logic Bugs of Join Optimizations in DBMS