Abstract

With the proliferation of biomedical literature, it is quite challenging for biomedical scientists to keep them updated with the new advancements. In biomedical literature retrieval systems, the keywords in the user-defined queries are often defined with various lexical variants consequently leading to the vocabulary mismatch (VM). One possible way to cope with these issues is to introduce a query expansion (QE) framework to enrich the original queries with the auxiliary semantically similar terms for each keyword mentioned in a query. In this research, we propose a biomedical QE framework to alleviate the VM. The proposed approach combines the clinical diagnosis information (CDI) and word embeddings (WEs) simultaneously to retrieve the relevant biomedical literature. The process of embeddings vocabulary terms as real-valued and low dimensional vectors referred to as word embedding has garnered significant attention by potentially capturing the implicit semantics. We have exploited threefold word embeddings (Domain-Specific, Domain-Agnostic, and Hybrid) and integrated the embeddings outcomes with the CDI to get the best query combination for the efficient retrieval of biomedical literature. Experimental results procured for the Text REtrieval Conference dataset showed that CDI, when used with the hybrid word embeddings surpassed the WEs trained for the domain-specific and domain-agnostic data. The results demonstrate that the utilization of this unique setup of merging two techniques is a valuable addition to the QE process leading to significantly improved precision rate and VM in biomedical literature retrieval. We hope that our approach would assist investigators to use this query combination to retrieve relevant articles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call