Abstract

BackgroundCurrently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently.MethodsG-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles.ResultsPerformance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php.ConclusionsG-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.

Highlights

  • Most people use National Center for Biotechnology Information (NCBI)’s PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information

  • It has been observed that these query expansion approaches [14,15] offer no significant advantages over the free-text based search methods; missing concepts and incomplete synonym sets were found to be the major causes of the inadequacy of existing query expansion schemes

  • To evaluate G-Bean’s search performance, we conducted a subjective evaluation using the 106 benchmark queries from the OHSUMED dataset, which is generated by clinicians in course of their patient care

Read more

Summary

Introduction

Most people use NCBI’s PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. Building a Web-based tool to find relevant biomedical literature in MEDLINE database in response to a query remains a challenge due to the increase in volume and diversity of topics of not utilize it as effectively as experienced users [5,6,7]. Those less-experienced users either fail to employ the most relevant context-sensitive keywords or fail to effectively formulate query expressions using Boolean logic [8,9]. It has been observed that these query expansion approaches [14,15] offer no significant advantages over the free-text based search methods; missing concepts and incomplete synonym sets (due to the use of only MeSH ontology) were found to be the major causes of the inadequacy of existing query expansion schemes

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.