Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches

Peter Sjögårde,Per Ahlgren,Ludo Waltman

doi:10.1002/asi.24452

Abstract

AbstractAlgorithmic classifications of research publications can be used to study many different aspects of the science system, such as the organization of science into fields, the growth of fields, interdisciplinarity, and emerging topics. How to label the classes in these classifications is a problem that has not been thoroughly addressed in the literature. In this study, we evaluate different approaches to label the classes in algorithmically constructed classifications of research publications. We focus on two important choices: the choice of (a) different bibliographic fields and (b) different approaches to weight the relevance of terms. To evaluate the different choices, we created two baselines: one based on the Medical Subject Headings in MEDLINE and another based on the Science‐Metrix journal classification. We tested to what extent different approaches yield the desired labels for the classes in the two baselines. Based on our results, we recommend extracting terms from titles and keywords to label classes at high levels of granularity (e.g., topics). At low levels of granularity (e.g., disciplines) we recommend extracting terms from journal names and author addresses. We recommend the use of a new approach, term frequency to specificity ratio, to calculate the relevance of terms.

Highlights

In recent years, scientometricians have developed methods for algorithmically constructing classifications of research publications based on relations between individual publications
We restrict the study to two aspects of class labeling: the choice of (a) different bibliographic fields and (b) different approaches to weight the relevance of terms
We use two baseline classifications, one based on Medical Subject Headings (MeSH) and one based on Science-Metrix journal classification (SMJC), to evaluate two key aspects of different labeling approaches: the choice of (a) different bibliographic fields and (b) different approaches to weight the relevance of terms

Summary

Introduction

Scientometricians have developed methods for algorithmically constructing classifications of research publications based on relations between individual publications. This has been done using large publication sets of tens of millions of publications (Boyack & Klavans, 2014; Sjögårde & Ahlgren, 2018; Waltman & van Eck, 2012). The obtained classifications have been used for various applications, such as identification of research topics and specialties, normalization of citations, measuring interdisciplinarity, and mapping research fields (Ahlgren, Colliander, & Sjögårde, 2018; Milanez, Noyons, & de Faria, 2016; Ruiz-Castillo & Waltman, 2015; Sjögårde & Ahlgren, 2020; Small, Boyack, & Klavans, 2014; Šubelj, van Eck, & Waltman, 2016; Wang & Ahlgren, 2018). Hierarchical classifications with labeled classes make it possible for users to browse large document collections (Cutting, Karger, Pedersen, & Tukey, 1992; Seifert, Sabol, Kienreich, Lex, & Granitzer, 2014). Perianes-Rodriguez and Ruiz-Castillo (2017) point out that

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the Association for Information Science and Technology	Publication Date: Jan 23, 2021
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the Association for Information Science and Technology

Lead the way for us

Similar Papers

Class-indexing-based term weighting for automatic text classification
Fuji Ren ... Mohammad Golam Sohrab
Information Sciences | VOL. 236
Fuji Ren, et. al.Fuji Ren ... Mohammad Golam Sohrab
27 Feb 2013
Information Sciences | VOL. 236

PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction.
Lawrence Wc Chan ... William Yl Chan
BMC Medical Informatics and Decision Making | VOL. 15
Lawrence Wc Chan, et. al.Lawrence Wc Chan ... William Yl Chan
02 Jun 2015
BMC Medical Informatics and Decision Making | VOL. 15

Imbalanced Text Categorization Based on Positive and Negative Term Weighting Approach
Behzad Naderalvojoud ... Alaettin Ucan
-
Behzad Naderalvojoud, et. al.Behzad Naderalvojoud ... Alaettin Ucan
01 Jan 2015
01 Jan 2015

Learning to Weight for Text Classification
Alejandro Moreo ... Fabrizio Sebastiani
IEEE Transactions on Knowledge and Data Engineering | VOL. 32
Alejandro Moreo, et. al.Alejandro Moreo ... Fabrizio Sebastiani
01 Feb 2020
IEEE Transactions on Knowledge and Data Engineering | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the Association for Information Science and Technology