The value of publicly available, textual and non-textual information for startup performance prediction

Ulrich Kaiser,Johan M Kuhn

doi:10.1016/j.jbvi.2020.e00179

Abstract

We use administrative textual and non-textual data retrieved from publicly available archives to predict the performance of Danish startups at the time of foundation. The performance outcomes we consider are survival, high employment growth, a return on assets of above 20 percent, new patent applications and participation in an innovation subsidy program. We consider a base specification that includes variables for legal form, region, ownership and industry in all specifications and add variable sets representing firm names, business purpose statements (BPSs) as well as founder and startup characteristics. To forecast the two innovation-related performance outcomes well, we only need to include a set of variables derived from the BPS texts on top of the base variables while an accurate prediction of startup survival requires the combination of the firm names and the BPS variables along with founder characteristics. An accurate forecast of high employment growth needs the combination of the BPS variables and the founder characteristics. All information our forecasts require is likely to be easily obtainable since the underlying information is mandatory to report upon business registration in many countries. The substantial accuracy of our predictions for survival, employment growth, new patents and participation in innovation subsidy programs indicates ample scope for algorithmic scoring models as an additional pillar of funding and innovation support decisions.

Highlights

Identifying promising startups is a formidable task for investors, creditors and policy makers alike
We study how well the initial G/S variables in combination with other publicly available and conveniently obtainable data can predict a broad range of performance outcomes: involuntary exit, high employment growth, a return on assets of above 20 percent, new patent applications and, as a more inclusive indicator of innovative activity, participation in an innovation subsidy program
We demonstrate that it is possible to accurately predict startup success, we show that the data required to generate such accurate predictions may be readily available from public sources

Summary

DISCUSSION

The Value of Publicly Available, Textual and Non-textual Information For Startup Performance Prediction. To forecast the two innovation-related performance outcomes well, we only need to include a set of variables derived from the BPS texts while an accurate prediction of startup survival and high employment growth needs the combination of (i) information derived from the names of the startups, (ii) data on elementary founder-related characteristics and (iii) either variables describing the initial characteristics of the startup (to predict startup survival) or business purpose statement information (to predict high employment growth). These sets of variables are obtainable since the underlying information is mandatory to report upon business registration. We gratefully acknowledge useful feedback from Jorg Claussen, Martin Murmann, Christian Peukert, Wolfgang Sofka and, in particular, Reinhold Kesler

INTRODUCTION

Findings

CONCLUSIONS

Discussion