Abstract
Bibliographic name disambiguation is an major semantic challenge, but critical to social sciences studies of important intellectual assets. Here we contribute to innovation research in several ways. We show a significant synonym problem in author names and discuss how a pre-processing heuristic step standardizing name variants helps, but homonyms generated with Chinese names are particularly difficult to resolve and manifest in an associated location list. Here we identify a new phenomenon of "onomastic profusion," the frequent use of certain words in firm names for semantic reasons that can confound disambiguation clustering algorithms. We illustrate these concerns with Patentopia, our customized platform accessing the PatentsView portal for the United States Patent and Trademark Office database and available for free academic use. This multi-stage system uses heuristics in concert with the PatentsView clustering process and reports meta-data to further assist analysis. As highly relevant use cases, we illustrate system performance with data derived from two important public innovation programs, I-Corps and Small Business Innovation Research (SBIR), and we close with implications for bibliometric analysis of current patent data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.