Measuring the Novelty of U.S. Patents

Jonathan H Ashtor

doi:10.2139/ssrn.3172298

Abstract

Forward citations are arguably the most widely used empirical metric for patents, including as indicators of patent information content, cumulative innovation, value, and knowledge flows. However, forward citations have major shortcomings. Citations require long time horizons to accrue, and therefore they cannot be observed until several years after a patent issues. Citation data are often noisy, discontinuous, and highly skewed, complicating empirical analysis. Moreover, recent studies have questioned the reliability of citation data. As such, the most widely used empirical metric of patents is also the most suspect. This study constructs a measure of patents that correlates with forward citations, but is observable ex ante, immediately upon patent issuance or even earlier upon publication of a patent application. In addition, this measure is continuous and evenly distributed, such that it is suitable for large‐scale patent analytics applications. Finally, unlike citations, the measure is portable across patent systems, facilitating cross‐border comparisons of portfolios and datasets. Specifically, I construct a measure of the similarity of a patent to its technological‐temporal cohort, based on linguistic analysis of claim text. I employ advanced computational linguistic techniques to analyze the claims of all U.S. patents issued in the period 1976–2017, over 6 million patents in total, and I calculate the average degree of conceptual similarity of each patented invention to all others in the same technology field and time period cohort. I then extend the methodology to all issued EP patents, over 1.6 million in total. I validate the resulting measures against multiple established patent metrics for U.S. and EP patents. I test the robustness of this measure as a forecast for future patent citations in empirical research and big‐data applications. I find that cohort similarity correlates significantly with forward citations received by both U.S. and EP patents. Cohort similarity also substitutes for citations in leading prior studies of R&D output and innovation. Finally, I demonstrate that, unlike citations, cohort similarity is comparable across the U.S. and EP patent systems. Accordingly, cohort similarity may be useful for empirical patent research, comparative studies of patent policy, and analytics of large‐scale patent portfolios.

Full Text