Abstract

Patent data represent a significant source of information on innovation, knowledge production, and the evolution of technology through networks of citations, co-invention and co-assignment. A major obstacle to extracting useful information from this data is the problem of name disambiguation: linking alternate spellings of individuals or institutions to a single identifier to uniquely determine the parties involved in knowledge production and diffusion. In this paper, we describe a new algorithm that uses high-resolution geolocation to disambiguate both inventors and assignees on about 8.5 million patents found in the European Patent Office (EPO), under the Patent Cooperation Treaty (PCT), and in the US Patent and Trademark Office (USPTO). We show this disambiguation is consistent with a number of ground-truth benchmarks of both assignees and inventors, significantly outperforming the use of undisambiguated names to identify unique entities. A significant benefit of this work is the high quality assignee disambiguation with coverage across the world coupled with an inventor disambiguation (that is competitive with other state of the art approaches) in multiple patent offices.

Highlights

  • Background & SummaryIn many contexts, technological progress and innovation is essential to national or regional economic growth and output[1]

  • This is a difficult task, as there are millions of names to disambiguate and an evaluation of how likely two names on patents are to be the same entity is not known a priori and often relies on machine learning techniques[13,15,16,17,18]. This disambiguation problem has been effectively approached in recent years in the context of patent data using Baysian methods[15], Markov Chain Monte Carlo approaches[19,20], structural equivalence or other network similarity properties[17,21], and supervised machine learning[22], with a significant effort continuing at the USPTO18 with a focus on inventor disambiguation using an efficient hierarchical clustering approach[19,20]

  • We combine a number of distinct databases covering different patent offices: the US Patent and Trademark Office (USPTO), the European Patent Office (EPO), and patents filed under the Patent Cooperation Treaty (PCT)

Read more

Summary

Background & Summary

Technological progress and innovation is essential to national or regional economic growth and output[1]. The OECD provides the HAN database[23] that corrects for a variety of alternate spellings and legal distinctions of assignees for patents in the USPTO, EPO, and PCT Each of these algorithms has been shown to be accurate and useful in certain contexts, but none provide a broad and unified approach to the disambiguation of assignee and inventor names in multiple patent offices. We believe the breadth and level of detail in the database produced by this work will be of great value to researchers, and will make the data freely available for noncommercial use

Methods
HanIDs
Findings
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call