Abstract
In the present study, we introduce a novel methodology for the harmonization and standardization of locations associated with patent transactions recorded at the USPTO from 2005 to 2022. Using natural language processing (NLP) techniques in conjunction with search engine-based web knowledge graphs, our method comprises four phases: data pre-processing, semantic clustering, exploitation of web-knowledge graphs, and API-driven harmonization. Initiating our analysis with a dataset of 63,838 unique locations, our methodology effectively reduces this number by more than 50 %. This approach exhibits an accuracy rate of approximately 92 %. The resulting geolocated dataset of companies’ patent transactions offers a valuable resource for fine-grained geographical analyses of the markets for technology; in particular, we provide examples of relevant economic insights which can be learned from looking at the geographical patterns of those transactions.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have