Goal-oriented Document-grounded Dialogue (DGD) is used for retrieving specific domain documents, assisting users in document content retrieval, question answering, and document management. Existing methods typically employ keyword extraction and vector space models to understand the content of documents, identify the intent of questions, and generate answers based on the capabilities of generation models. However, challenges remain in semantic understanding, long text processing, and context understanding. The emergence of Large Language Models (LLMs) has brought new capabilities in context learning and step-by-step reasoning. These models, combined with Retrieval Augmented Generation(RAG) methods, have made significant breakthroughs in text comprehension, intent detection, language organization, offering exciting prospects for DGD research. However, the “hallucination” issue arising from LLMs requires complementary methods to ensure the credibility of their outputs. In this paper we propose a goal-oriented document-grounded dialogue approach based on evidence generation using LLMs. It designs and implements methods for document content retrieval & reranking, fine-tuning and inference, and evidence generation. Through experiments, the method of combining LLMs with vector space model, or with key information matching technique is used as a comparison, the accuracy of the proposed method is improved by 21.91% and 12.81%, while the comprehensiveness is increased by 10.89% and 69.83%, coherence is enhanced by 38.98% and 53.27%, and completeness is boosted by 16.13% and 36.97%, respectively, on average. Additional, ablation analysis conducted reveals that the evidence generation method also contributes significantly to the comprehensiveness and completeness.
Read full abstract