Abstract

Measuring the diffusion of innovations from textual data sources besides patent data has not been studied extensively. However, early and accurate indicators of innovation and the recognition of trends in innovation are mandatory to successfully promote economic growth through technological progress via evidence-based policy making. In this study, we propose Paragraph Vector Topic Model (PVTM) and apply it to technology-related news articles to analyze innovation-related topics over time and gain insights regarding their diffusion process. PVTM represents documents in a semantic space, which has been shown to capture latent variables of the underlying documents, e.g., the latent topics. Clusters of documents in the semantic space can then be interpreted and transformed into meaningful topics by means of Gaussian mixture modeling. In using PVTM, we identify innovation-related topics from 170, 000 technology news articles published over a span of 20 years and gather insights about their diffusion state by measuring the topic importance in the corpus over time. Our results suggest that PVTM is a credible alternative to widely used topic models for the discovery of latent topics in (technology-related) news articles. An examination of three exemplary topics shows that innovation diffusion could be assessed using topic importance measures derived from PVTM. Thereby, we find that PVTM diffusion indicators for certain topics are Granger causal to Google Trend indices with matching search terms.

Highlights

  • The rapidly growing amount of digital information provides novel data sources for economic analysis with regard to identifying and measuring innovation trends

  • Our results suggest that Paragraph Vector Topic Model (PVTM) is well suited for topic modeling this type of text data

  • We focused on the Distributed Bag of Words (DBOW) methodology, as it has been shown to produce slightly better results compared to Distributed Memory (DM)

Read more

Summary

Introduction

The rapidly growing amount of digital information provides novel data sources for economic analysis with regard to identifying and measuring innovation trends. The authors utilized patent fillings dating back to 1840 to estimate their novelty and significance by quantifying the impact on future technological innovations, using a time-aware term-weighting scheme based on term-frequency inverse-document-frequency (tf-idf) that the authors constructed for this purpose With this approach, the authors can capture technological evolution over a long time span, demonstrating the captured trends as strong predictors of productivity at various levels. We use Paragraph Vector ( known as Doc2Vec, [24]) to compute vector space representations of text documents and Gaussian mixture models (GaussMMs) to cluster the resulting document vectors into meaningful semantic topics. We call this combination of embedding and clustering Paragraph Vector Topic Modeling (PVTM).

Paragraph vector topic modeling
Neural embeddings of words and documents
Paragraph vector
Gaussian mixture clustering
Technology-related news corpus
Approaching the diffusion of innovations from related topics
Findings
Conclusion and outlook

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.