The effect of clustering algorithms on question answering

Rana Husni Almahmoud,Marwah Alian

doi:10.1016/j.eswa.2023.122959

Abstract

Question answering (QA) is one of the essential fields in information retrieval where specific answers are provided instead of large documents. The relations among questions and answers are determined using natural language processing techniques while clustering algorithms can be helpful in improving the effectiveness of result retrieval by reducing the amount of required comparisons for a specific question or answer. In this work, we introduce a clustering-based approach for a QA system. This approach groups related questions into clusters using different clustering algorithms, specifies the appropriate answer using similarity methods between the answers and the generated clusters, and then assigns answers to their most related questions. Different clustering algorithms, such as k-means, spherical k-means, single-linkage hierarchical clustering (SLHA), unweighted pair group method with arithmetic mean (UPGMA), expectation–maximization (EM), and clustering Arabic documents based on bond energy (CADBE), are tested. The effectiveness of a clustering algorithm is investigated with respect to certain factors, including number of clusters, text representation, similarity measure between answers and clusters, and similarity measure between answers and questions in a selected cluster. In addition, a comprehensive ranking system is introduced to evaluate the performance of clustering algorithms. Evaluation is performed using the Dataset of Arabic Why Question Answering System (DAWQAS) and the Multilingual Question Answering (MLQA) dataset. Results show that CADBE achieves the highest accuracy and the first rank, followed by SLHA and UPGMA, while spherical k-means has the lowest rank. The performance of clustering algorithms for MLQA dataset is affected by its characteristics, such as short questions, long and varied answers, and diverse subject domains. Unigram and bigram intersection measures perform well in most cases. Term frequency inverse document frequency representation outperforms word embedding in DAWQAS. Overall, the experiments provide insights into the performance of clustering algorithms in QA systems.

Full Text