Abstract

Patent classification systems and citation networks are used extensively in innovation studies. However, non-unique mapping of classification codes onto specific products/markets and the difficulties in accurately capturing knowledge flows based just on citation linkages present limitations to these conventional patent analysis approaches. We present a natural language processing based hierarchical technique that enables the automatic identification and classification of patent datasets into technology areas and sub-areas. The key novelty of our technique is to use topic modeling to map patents to probability distributions over real world categories/topics. Accuracy and usefulness of our technique are tested on a dataset of 10,201 patents in solar photovoltaics filed in the United States Patent and Trademark Office (USPTO) between 2002 and 2013. We show that linguistic features from topic models can be used to effectively identify the main technology area that a patent's invention applies to. Our computational experiments support the view that the topic distribution of a patent offers a reduced-form representation of the knowledge content in a patent. Accordingly, we suggest that this hidden thematic structure in patents can be useful in studies of the policy–innovation–geography nexus. To that end, we also demonstrate an application of our technique for identifying patterns in technological convergence.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.