Abstract

The age of researchers is a critical factor necessary to study the bibliometric characteristics of the scholars that produce new knowledge. In bibliometric studies, the age of scientific authors is generally missing; however, the year of the first publication is frequently considered as a proxy of the age of researchers. In this article, we investigate what are the most important bibibliometric factors that can be used to predict the age of researchers (birth and PhD age). Using a dataset of 3574 researchers from Quebec for whom their Web of Science publications, year of birth and year of their PhD are known, our analysis falls under the linear regression setting and focuses on investigating the predictive power of various regression models rather than data fitting, considering also a breakdown by fields. The year of first publication proves to be the best linear predictor for the age of researchers. When using simple linear regression models, predicting birth and PhD years result in an error of about 3.7 years and 3.9 years, respectively. Including other bibliometric data marginally improves the predictive power of the regression models. A validation analysis for the field breakdown shows that the average length of the prediction intervals vary from 2.5 years for Basic Medical Sciences (for birth years) up to almost 10 years for Education (for PhD years). The average models perform significantly better than the models using individual observations. Nonetheless, the high variability of data and the uncertainty inherited by the models advice to caution when using linear regression models for predicting the age of researchers.

Highlights

  • Several sociodemographic factors have been shown to affect researchers’ scholarly output and impact (Costas & Bordons, 2011; Gingras, Larivière, Macaluso, Robitaille, 2008; Mauleón & Bordons, 2006)

  • We aim to assess how reliable is the estimation of the real ages of scholars based on models that exclusively rely on bibliometric indicators, such as the year of first publication, author order, co-authors, document types published, etc.)

  • Accounting for other information marginally increases the performance of the linear models. To further validate this conclusion, we investigate the performance of the simple linear regression models by splitting the dataset randomly in 2 dataset (A and B)6

Read more

Summary

Introduction

Several sociodemographic factors have been shown to affect researchers’ scholarly output and impact (Costas & Bordons, 2011; Gingras, Larivière, Macaluso, Robitaille, 2008; Mauleón & Bordons, 2006). One of the central sociodemographic characteristics of scholars is their age (Costas & Bordons, 2011; Gingras et al, 2008; Levin & Stephan, 1989), as it has been shown to be a key predictor of research productivity (Bornmann & Leydesdorff, 2014; Falagas, Ierodiakonou, & Alexiou, 2008; Levin & Stephan, 1989) Such variable is generally not included in bibliometric analyses, given its lack of availability. We aim to assess how reliable is the estimation of the real ages of scholars based on models that exclusively rely on bibliometric indicators, such as the year of first publication, author order, co-authors, document types published, etc.)

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.