Explaining and predicting the impact of authors within a community: an assessment of the bibliometric literature and application of machine learning

Sen Chai,Lee Fleming,Alexander D’Amour

doi:10.1093/icc/dtz042

Abstract

AbstractFollowing widespread availability of computerized databases, much research has correlated bibliometric measures from papers or patents to subsequent success, typically measured as the number of publications or citations. Building on this large body of work, we ask the following questions: given available bibliometric information in one year, along with the combined theories on sources of creative breakthroughs from the literatures on creativity and innovation, how accurately can we explain the impact of authors in a given research community in the following year? In particular, who is most likely to publish, publish highly cited work, and even publish a highly cited outlier? And, how accurately can these existing theories predict breakthroughs using only contemporaneous data? After reviewing and synthesizing (often competing) theories from the literatures, we simultaneously model the collective hypotheses based on available data in the year before RNA interference was discovered. We operationalize author impact using publication count, forward citations, and the more stringent definition of being in the top decile of the citation distribution. Explanatory power of current theories altogether ranges from less than 9% for being top cited to 24% for productivity. Machine learning (ML) methods yield similar findings as the explanatory linear models, and tangible improvement only for non-linear Support Vector Machine models. We also perform predictions using only existing data until 1997, and find lower predictability than using explanatory models. We conclude with an agenda for future progress in the bibliometric study of creativity and look forward to ML research that can explain its models.

Full Text