ArcheType: A Novel Framework for Open-Source Column Type Annotation Using Large Language Models

Benjamin Feuer,Juliana Freire,Yurong Liu,Chinmay Hegde

doi:10.14778/3665844.3665857

Abstract

Existing deep-learning approaches to semantic column type annotation (CTA) have important shortcomings: they rely on semantic types which are fixed at training time; require a large number of training samples per type; incur high run-time inference costs; and their performance can degrade when evaluated on novel datasets, even when types remain constant. Large language models have exhibited strong zero-shot classification performance on a wide range of tasks and in this paper we explore their use for CTA. We introduce ArcheType, a simple, practical method for context sampling, prompt serialization, model querying, and label remapping, which enables large language models to solve CTA problems in a fully zero-shot manner. We ablate each component of our method separately, and establish that improvements to context sampling and label remapping provide the most consistent gains. ArcheType establishes a new state-of-the-art performance on zero-shot CTA benchmarks (including three new domain-specific benchmarks which we release along with this paper), and when used in conjunction with classical CTA techniques, it outperforms a SOTA DoDuo model on the fine-tuned SOTAB benchmark.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ArcheType: A Novel Framework for Open-Source Column Type Annotation Using Large Language Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: May 1, 2024
Citations: 1

Similar Papers

Semi-supervised deep learning for hyperspectral image classification
Xudong Kang ... Puhong Duan
Remote Sensing Letters | VOL. 10
Xudong Kang, et. al.Xudong Kang ... Puhong Duan
03 Jan 2019
Remote Sensing Letters | VOL. 10

Research on the Method of Tibetan Recognition Based on Component Location Information
Yuehui Han ... Yiqun Wang
-
Yuehui Han, et. al.Yuehui Han ... Yiqun Wang
01 Jan 2018
01 Jan 2018

Visual object tracking with online sample selection via lasso regularization
Qiao Liu ... Quan Zhou
Signal, Image and Video Processing | VOL. 11
Qiao Liu, et. al.Qiao Liu ... Quan Zhou
11 Jan 2017
Signal, Image and Video Processing | VOL. 11

Transfer Learning in General Lensless Imaging through Scattering Media
Yukuan Yang ... Jing Pei
-
Yukuan Yang, et. al.Yukuan Yang ... Jing Pei
09 Nov 2020
09 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ArcheType: A Novel Framework for Open-Source Column Type Annotation Using Large Language Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment