TabSAL: Synthesizing Tabular data with Small agent Assisted Language models

Jiale Li,Run Qian,Yandan Tan,Zhixin Li,Luyu Chen,Sen Liu,Jie Wu,Hongfeng Chai

doi:10.1016/j.knosys.2024.112438

Abstract

Tabular data are widely used in machine-learning tasks because of their prevalence in various fields; however, the potential risks of data breaches in tabular data and privacy protection regulations render such data almost unavailable. Tabular data generation methods alleviate data unavailability by synthesizing privacy-free data, and generating data using language models is a novel innovation. Language models can synthesize high-quality datasets by learning knowledge from nondestructive information and recognizing the semantics of table columns. However, when current language models function as generators, their encoding methods are hindered by complicated decoding processes, and the limited predictive ability of language models restricts their generative capability. To this end, we propose an encoding method based on interactive data structures such as JavaScript Object Notation for converting tabular data. We design TabSAL, which is a pluggable tabular data generation framework with small agent assisted language models, to boost the predictive capability, resulting in high-quality synthetic datasets with a much lower computational resource cost. In addition, a benchmark that integrates eight datasets, three methods, and three assessment directions has been issued, which indicates that TabSAL surpasses the state of the art by up to 60%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TabSAL: Synthesizing Tabular data with Small agent Assisted Language models

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Similar Papers

Tabular and latent space synthetic data generation: a literature review
Joao Fonseca ... Fernando Bacao
Journal of Big Data | VOL. 10
Joao Fonseca, et. al.Joao Fonseca ... Fernando Bacao
10 Jul 2023
Journal of Big Data | VOL. 10

Observatory: Characterizing Embeddings of Relational Tables
Tianji Cong ... H V Jagadish
Proceedings of the VLDB Endowment | VOL. 17
Tianji Cong, et. al.Tianji Cong ... H V Jagadish
01 Dec 2023
Proceedings of the VLDB Endowment | VOL. 17

TLTD: Transfer Learning for Tabular Data
Maxim Bragilovski ... Shelly Levy-Tzedek
Applied Soft Computing | VOL. 147
Maxim Bragilovski, et. al.Maxim Bragilovski ... Shelly Levy-Tzedek
14 Aug 2023
Applied Soft Computing | VOL. 147

Deep Tabular Data Modeling With Dual-Route Structure-Adaptive Graph Networks
Qinghua Zheng ... Zhen Peng
IEEE Transactions on Knowledge and Data Engineering | VOL. 35
Qinghua Zheng, et. al.Qinghua Zheng ... Zhen Peng
01 Sep 2023
IEEE Transactions on Knowledge and Data Engineering | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TabSAL: Synthesizing Tabular data with Small agent Assisted Language models

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems