Abstract

Science and technology (S&T) linkages have been studied extensively using patent and scientific publication databases. Existing methods used to track S&T linkages, such as analysis of non-patent literature (NPL) or author-inventor matching offer a narrow window for industry level analysis of the data. This paper examines the application of a machine learning algorithm, namely Latent Dirichlet Allocation, to detect the semantic relationship between patent and scientific publication corpus. The case of “Taxol”, a cancer drug, is used to illustrate the performance of the unsupervised algorithm in clustering documents with similar topics. In total 26 475 documents retrieved from the Europe PMC database was used a sample for the analysis. Qualitative analysis of the clusters shows that the topic clustering algorithm is valuable approach in detection of patent and publication linkage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call