Abstract

BackgroundGraph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed® search interface, a MEDLINE® citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web.ResultsWe conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics.ConclusionThe link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain.

Highlights

  • Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks

  • 2.1 Experimental Design Retrieval experiments were conducted using the test collection from the Text Retrieval Conferences (TRECs) 2005 genomics track [4], which used a ten-year subset of MEDLINE

  • This work examines two well-known algorithms that exploit link structure to score the importance of nodes in a hyperlink graph such as the Web: PageRank [1] and HITS [2]

Read more

Summary

Introduction

Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. Documents do not exist in isolation – their environments provide an important source of evidence for ranking results with respect to a user's query This insight is captured in algorithms such as PageRank [1] and HITS [2] ( known as "hubs and authorities"). Experiments show that incorporating evidence extracted from such networks yields statistically significant improvements in document retrieval effectiveness, as measured by standard rankedretrieval metrics.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.