Abstract

With the increasing demand for large amounts of training data for model development, this paper proposes LLM4Label, an automatic text labeling method based on large language models, to assist human labelers in annotating text data. LLM4Label first selects the most representative seed data using a clustering algorithm based on text similarity. It then constructs prompt dialogues with few-shot prompts to stimulate the language model’s performance on entity labeling tasks, enabling it to automatically and efficiently label more data. Finally, LLM4Label introduces human feedback to correct un- certain labeling results and retrains the model with the corrected annotations. Experiments show that LLM4Label achieves high- quality labeled data at low human labeling cost. The proposed method provides an effective way to obtain sizable and high- quality annotated datasets with minimal manual effort, which can strongly support downstream natural language processing tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call