Abstract
Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted.
Highlights
Many natural language queries are submitted to search engines on the Web every day, and an increasing number of online search engines target domain-specific search services
We focus on semantically understanding PubMed queries with exactly two bio-entities as bioNLP research in entity relations has long focused on relations between dual entities: chemical–disease relations [2], protein-protein interaction [3], gene events [4], drug-drug interaction [5] and disease co-morbidities [6]
The dotted lines represent the performance of our baseline, which estimated patterns’ semantic similarity by the cosine similarity of their specific participating entity pairs in the queries without using latent semantic analysis (LSA) topic information
Summary
Many natural language queries are submitted to search engines on the Web every day, and an increasing number of online search engines target domain-specific search services. Similar as the queries chlorthalidone vs hydrochlorothiazide and chlorthalidone versus hydrochlorothiazide are, PubMed returns 2.5 times more relevant articles when users compare these two drugs using versus than using vs Such performance difference in retrieval effectiveness may be reduced and/or the levels of user satisfaction may be maintained if queries of similar semantic meaning were presented at search time. In this regard, this paper learns to discover semantic relations between bio-concepts (such as chemicals and diseases) on the Web for possible help of biocuration and retrieval effectiveness. We focus on semantically understanding PubMed queries with exactly two bio-entities as bioNLP research in entity relations has long focused on relations between dual entities: chemical–disease relations [2], protein-protein interaction [3], gene events [4], drug-drug interaction [5] and disease co-morbidities [6]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have