Abstract

Question answer (QA) systems can serve as vital tools to address lay users’ information needs in healthcare. While QA systems have the potential to lessen information overload and provide quality answers to users, it is important to holistically evaluate their performance. Here we propose multiple dimensions for this purpose comprising lexical similarity, semantic similarity, absence of contradictions and readability of responses. We then use the dimensions to evaluate DiseaseGuru, a generative large language model-based chronic disease QA system we developed that integrates knowledge graph technology to provide quality responses to lay users. The results are presented comparing it with three benchmark algorithms across the different dimensions. We also propose metrics for lay users and medical professionals for a future field study to evaluate the system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call