Abstract

Bibliographic coupling (BC) is a similarity measure for scientific articles. It works based on an expectation that two articles that cite a similar set of references may focus on related (or even the same) research issues. For analysis and mapping of scientific literature, BC is an essential measure, and it can also be integrated with different kinds of measures. Further improvement of BC is thus of both practical and technical significance. In this paper, we propose a novel measure that improves BC by tackling its main weakness: two related articles may still cite different references. Category-based cocitation (category-based CC) is proposed to estimate how these different references are related to each other, based on the assumption that two different references may be related if they are cited by articles in the same categories about specific topics. The proposed measure is thus named BCCCC (Bibliographic Coupling with Category-based Cocitation). Performance of BCCCC is evaluated by experimentation and case study. The results show that BCCCC performs significantly better than state-of-the-art variants of BC in identifying highly related articles, which report conclusive results on the same specific topics. An experiment also shows that BCCCC provides helpful information to further improve a biomedical search engine. BCCCC is thus an enhanced version of BC, which is a fundamental measure for retrieval and analysis of scientific literature.

Highlights

  • Given two scientific articles a1 and a2, bibliographic coupling (BC) is a measure to estimate the similarity between a1 and a2 by considering how a1 and a2 cite a similar set of references [1]

  • The results show that BCCCC performs significantly better than state-of-the-art variants of BC in identifying highly related articles, which report conclusive results on the same specific topics

  • The results show that BCCCC performs significantly better than each baseline in all evaluation criteria Mean Average Precision (MAP) and Average P@X (X = 1, 3, 5, and 10)

Read more

Summary

Introduction

Given two scientific articles a1 and a2 , bibliographic coupling (BC) is a measure to estimate the similarity between a1 and a2 by considering how a1 and a2 cite a similar set of references [1]. Text-based measures often extract terms from the textual contents of articles, and the similarity between articles is estimated by several factors that are often employed by information retrieval studies. These factors are concerned with each term t, each article a, and how t appears in a. Other factors include occurrence of the stem of t (i.e., the base or root form of t), positions of t in a, and key terms specified for a These factors, together with some factors noted above (TF, IDF, and article length), were employed by the article recommendation service provided by PubMed, which is a popular biomedical search engine [23,24]. This service was found to be one of the best to cluster scientific articles [22]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call