Of the existing theoretical formulas for the h-index, those recently suggested by Burrell (J Informetr 7:774–783, 2013b) and by Bertoli-Barsotti and Lando (J Informetr 9(4):762–776, 2015) have proved very effective in estimating the actual value of the h-index Hirsch (Proc Natl Acad Sci USA 102:16569–16572, 2005), at least at the level of the individual scientist. These approaches lead (or may lead) to two slightly different formulas, being based, respectively, on a “standard” and a “shifted” version of the geometric distribution. In this paper, we review the genesis of these two formulas—which we shall call the “basic” and “improved” Lambert-W formula for the h-index—and compare their effectiveness with that of a number of instances taken from the well-known Glänzel–Schubert class of models for the h-index (based, instead, on a Paretian model) by means of an empirical study. All the formulas considered in the comparison are “ready-to-use”, i.e., functions of simple citation indicators such as: the total number of publications; the total number of citations; the total number of cited paper; the number of citations of the most cited paper. The empirical study is based on citation data obtained from two different sets of journals belonging to two different scientific fields: more specifically, 231 journals from the area of “Statistics and Mathematical Methods” and 100 journals from the area of “Economics, Econometrics and Finance”, totaling almost 100,000 and 20,000 publications, respectively. The citation data refer to different publication/citation time windows, different types of “citable” documents, and alternative approaches to the analysis of the citation process (“prospective” and “retrospective”). We conclude that, especially in its improved version, the Lambert-W formula for the h-index provides a quite robust and effective ready-to-use rule that should be preferred to other known formulas if one’s goal is (simply) to derive a reliable estimate of the h-index.
Read full abstract