Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents

Sergio Pelaez,Gaurav Verma,Barbara Ribeiro,Philip Shapira

doi:10.1162/qss_a_00285

Abstract

Abstract We put forward a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. The approach is used to discover public value expressions in patents. Using text (5.4 million sentences) for 154,934 US AI patent documents from the United States Patent and Trademark Office (USPTO), we design a semi-automated, human-supervised framework for identifying and labeling public value expressions in these sentences. A GPT-4 prompt is developed that includes definitions, guidelines, examples, and rationales for text classification. We evaluate the labels and rationales produced by GPT-4 using BLEU scores and topic modeling, finding that they are accurate, diverse, and faithful. GPT-4 achieved an advanced recognition of public value expressions from our framework, which it also uses to discover unseen public value expressions. The GPT-produced labels are used to train BERT-based classifiers and predict sentences on the entire database, achieving high F1 scores for the 3-class (0.85) and 2-class classification (0.91) tasks. We discuss the implications of our approach for conducting large-scale text analyses with complex and abstract concepts. With careful framework design and interactive human oversight, we suggest that generative language models can offer significant assistance in producing labels and rationales.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Quantitative Science Studies	Publication Date: Mar 1, 2024
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents

Abstract

Talk to us

Similar Papers

More From: Quantitative Science Studies

Lead the way for us

Similar Papers

Prioritization: Addressing the Patent Application Backlog at the United States Patent and Trademark Office

-

18 Feb 2014
18 Feb 2014

Dynamics of Topics in Antimalarial Patents: Comparison Between the USPTO and SIPO
Bo Kyeong Lee ... So Young Sohn
-
Bo Kyeong Lee, et. al.Bo Kyeong Lee ... So Young Sohn
01 Jan 2019
01 Jan 2019

Patent Applications and the Performance of the U.S. Patent and Trademark Office
Christopher Anthony Cotropia ... Ogden H Webster
SSRN Electronic Journal | VOL. 23
Christopher Anthony Cotropia, et. al.Christopher Anthony Cotropia ... Ogden H Webster
03 Mar 2013
SSRN Electronic Journal | VOL. 23

Worldwide nanotechnology development: a comparative study of USPTO, EPO, and JPO patents (1976–2004)
Xin Li ... Yiling Lin
Journal of Nanoparticle Research | VOL. 9
Xin Li, et. al.Xin Li ... Yiling Lin
27 Jul 2007
Worldwide nanotechnology development: a comparative study of USPTO, EPO, and JPO patents (1976–2004)
Xin Li ... Yiling Lin

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents

Abstract

Talk to us

Similar Papers

More From: Quantitative Science Studies