Abstract
In this paper, we leverage non-survey data (i.e., news articles), natural language processing (NLP), and deep learning methods to detect and measure innovation, ultimately enriching innovation surveys. Our dataset is composed of 1.9M news articles published between 2013 and 2018 acquired from Dow Jones Data, News, and Analytics. We use Bidirectional Encoder Representation from Transformers (BERT), a neural network-based technique for NLP pre-training developed by Google. Our methods involve: (i) utilizing Google’s BERT as a binary classifier to identify articles that mention innovation, (ii) developing BERT’s named-entity recognition algorithm to extract company names from these articles, (iii) leveraging BERT’s question and answering capabilities to extract company and product names. As a result, we obtain innovation indicators, i.e., company innovations in the pharmaceutical sector.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have