Abstract

Recent large-scale bibliometric models have largely been based on direct citation, and several recent studies have explored augmenting direct citation with other citation-based or textual characteristics. In this study we compare clustering results from direct citation, extended direct citation, a textual relatedness measure, and several citation-text hybrid measures using a set of nine million documents. Three different accuracy measures are employed, one based on references in authoritative documents, one using textual relatedness, and the last using document pairs linked by grants. We find that a hybrid relatedness measure based equally on direct citation and PubMed-related article scores gives more accurate clusters (in the aggregate) than the other relatedness measures tested. We also show that the differences in cluster contents between the different models are even larger than the differences in accuracy, suggesting that the textual and citation logics are complementary. Finally, we show that for the hybrid measure based on direct citation and related article scores, the larger clusters are more oriented toward textual relatedness, while the smaller clusters are more oriented toward citation-based relatedness.

Highlights

  • With the increasing availability of large-scale bibliographic data and the increased capacity of algorithms to cluster these data, highly detailed science maps and models are becoming ever more common

  • We find that a hybrid relatedness measure based on direct citation and PubMed-related article scores gives more accurate clusters than the other relatedness measures tested

  • We show that for the hybrid measure based on direct citation and related article scores, the larger clusters are more oriented toward textual relatedness, while the smaller clusters are more oriented toward citation-based relatedness

Read more

Summary

Introduction

With the increasing availability of large-scale bibliographic data and the increased capacity of algorithms to cluster these data, highly detailed science maps and models are becoming ever more common. Few have attempted to compare the cluster-level results of such studies, with the work by Velden, Boyack, et al (2017) as a notable exception Those creating models of science have always sought to establish the validity of their models in some way, quantitative studies of accuracy are a more recent occurrence, for large-scale models. The first such study using a very large literature data set was done by Boyack and colleagues using a set of 2.15 million PubMed documents (Boyack & Klavans, 2010; Boyack, Newman, et al, 2011) It compared text-based, citation-based, and hybrid relatedness measures where titles, abstracts, and MeSH terms were obtained from PubMed while references for each document were obtained from Scopus via matching database records. Words that occurred in at least four and not more than 500 documents were included in the calculation

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call