Abstract

The purpose of this research is to retrieve relevant patent documents and identify classification codes and search keywords that best characterize a given technological domain found in patent literature. The World Intellectual Property Organization (WIPO) recorded a rising number of patent applications filed under the Patent Cooperation Treaty (PCT) which is becoming the norm for filing patents in multiple jurisdictions. As such, PCT documents are a valuable source of information related to innovation activities with some degree of entrepreneurial intention. However, searching for relevant patent documents can be a daunting and uncertain process. We constructed a high-dimensional matrix consisting of two data types: classification codes and search keywords known as the code-keyword matrix. In turn, two machine learning algorithms called principal components analysis (PCA) and k-means clustering were used to derive insights from the high-dimensional dataset. Consequently, a two-dimensional PCA biplot and clustering on an optimized PCA dataset called Eigen-PCA were obtained using our combined machine learning method. Using such algorithms, we were able to identify correlation relationships found between the two data types. We also clustered the classification codes by least-relevance, medium-relevance, and high-relevance for the domain of anti-corrosion technologies, an impactful area for steel infrastructure in maritime environments. Such patent data analytics can be adapted to other areas such as medical technologies, green energy transition towards Net Zero and conservation of biological diversity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call