Abstract
Open-domain Question-Answering (QA) task requires a QA system to answer a given question using a large knowledge base like Wikipedia. Modern Open-domain QA systems often follow the two-stage framework Retriever-Reader where the retriever greatly impacts the end-to-end performance. Efficient Vietnamese Open-domain QA systems for single and multi-hop questions have yet to be studied. Although resource-rich languages like English witnessed many advancements in Open-domain QA, these methods often suffer from low data situations. This study proposes ViWiQA, an efficient Vietnamese Open-domain QA system over the Wikipedia knowledge base, with two novel retriever methods for single-hop and multi-hop questions. ViWiQA can be effectively trained with low data and significantly outperforms Lucene-BM25 and Dense Passage Retrieval when adapted to Vietnamese datasets. For single-hop QA, the proposed retriever outperforms Lucene-BM25 by 20% in top-1 retrieval accuracy, and the end-to-end system achieves 15% and 17% absolute gain in EM and F1 scores, respectively. For multi-hop QA, the proposed retriever increases the accuracy of retrieving correct passage pairs by 4% compared to Lucene-BM25, and the end-to-end system shows 7% and 17% absolute gains in EM and F1 scores.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have