Community question retrieval in health forums

Hamman Samuel,Mi-Young Kim,Sankalp Prabhakar,Mohomed Shazan Mohomed Jabbar,Osmar Zalane

doi:10.1109/bhi.2017.7897209

Abstract

Community Question Answering (CQA) has emerged as a popular type of service enabling users to ask and answer questions, and access the existing knowledgebase. CQA archives contain a lot of useful user-generated content and have been recognized as important information resources for the web. To improve accessibility to this body of knowledge in CQA archives, effective and efficient question retrieval is required. Question retrieval in a CQA archive aims to identify and retrieve existing questions that are relevant to new user questions. The objective of this study is to develop a question retrieval system that can sift through such forums and identify existing questions which are most similar to the user-provided question. We focus on health forums, and propose a CQA system using weighted TF-IDF, relevance heuristics, and term expansion. We compare our proposed algorithm against other well-known methods, and demonstrate that our method outperforms the Latent Dirichlet allocation (LDA) topic model, Latent Semantic Indexing (LSI), language modelbased information retrieval, BM25, vector space, Word2Vec, and semantic similarity approaches. Our initial experiments use datasets from the IEEE Healthcare Data Analytics Challenge 2015, and we also present our efforts towards development of a Bronze Standard for question similarity evaluation using self-annotations and annotations provided by affiliates of Mayo Clinic.

Full Text