Abstract

Subject categorization of scientific publications, i.e., journals, book series or conference proceedings, has become a main concern in academia, as publication impact and ranking are considered a basic criterion to evaluate paper quality. Publishers usually propose their own categorization, but they often include only their own publications and their categories might not be coherent with other proposals. Also, due to the dynamic nature of science, new categories may frequently appear. As traditional mechanisms for categorization have been questioned by many authors, a new research line has emerged to improve the category assignment process. Approaches usually rely on assessing publication similarity in terms of topics, co-citation, editorial boards, and/or shared author profiles. In this work, we propose a novel procedure for scientific publication hierarchical categorization based on the repetition or absence of relevant descriptors in association rules among publications. The key idea is that publication categories can be automatically defined by strong associations of nuclear topics. Also, some very specific subcategories can be defined by exclusion from any set of rules. This process can be used to construct a data-driven hierarchy of scientific publication categories from scratch or to improve any existing categorization by discovering new fields. In this paper the proposed algorithm uses SJR descriptors all journals in the SCImago dataset and the three-level classification in the Scopus dataset (covering only 35 % of publications of the SCImago dataset) to discover new categories and assign every journal to the resulting enhanced hierarchy one. We have focused on the field of “Physical Sciences and Engineering”, using the SCImago and Scopus datasets from 2019 (30,883 scientific publications). Our procedure combines data engineering techniques with association rules and generates as a result potential new categories and outlier subcategories. To evaluate the suitability of our proposal, we have analyzed classification results based on the original category list and our extended two-level categorization via the Jensen–Shannon divergence and supervised machine-learning techniques. Results reveal the consistency and suitability of our categorization procedure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call