Abstract

Existing record linkage methods often use ad-hoc weights, thresholds, and decision rules that may lead to unwanted and systematic errors in the results. We use a unique set of 98,762 labeled inventor records (i.e. records for which we know the true identification of the individual or entity) obtained from 824 USPTO optoelectronics inventors to test the accuracy of these existing methods. We then develop a new algorithm that is a variant of the Random Forests classification method to predict whether or not pairs of USPTO inventor records match. Our new approach to inventor disambiguation - which we call Conditional Forest of Random Forests - reduces false positive and false negative error rates by 84.5% and 92.7% respectively, over existing USPTO inventor disambiguation algorithms on our set of labeled inventor records. Unlike the existing algorithms, our errors do not occur systematically. The substantial reduction in and less systematic occurrence of disambiguation error suggests that research using the results of existing disambiguation methods should be revisited, as the systematic bias of the errors in these disambiguation results may affect the conclusions of these and subsequent studies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.