Integrating Machine Learning Utility in Tabular Data Synthesizer Training using Loss Function Learning

Muhammad Rizqi Nur,Rarasmaya Indraswari

doi:10.28926/ilkomnika.v6i2.646

Abstract

Machine learning (ML) utility has been the main evaluation metrics for data synthesizers. However, because ML utility cannot be simply calculated, none of the previous synthesizers were trained to reach the same level of ML utility as a training objective. This study aims to integrate ML utility into data synthesizer training using a transformer-based model as a learned loss function. The transformer was trained to estimate ML utility of synthetic datasets, then it’s integrated by backpropagating the difference between estimated and expected value. The integration has significantly improved the average ML utility of LCT-GAN and Realtabformer. The ML utility of LCT-GAN improved by 0.0158 for Contraceptive dataset, 0.031 for Insurance dataset, and 0.0561 for Treatment dataset. The ML utility of Realtabformer improved by 0.02 for Contraceptive dataset and 0.0024 for Insurance dataset. The increase affects the dataset distribution, correlation between features, and privacy, but the direction varies. Correlation coefficients indicate that synthetic data distribution gets closer to real data as ML utility improves. In addition to ML utility integration, this study has also shown that patterns between rows in a dataset can be learned, so better synthesizers can be developed based on them.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Integrating Machine Learning Utility in Tabular Data Synthesizer Training using Loss Function Learning

Abstract

Talk to us

Similar Papers

More From: ILKOMNIKA: Journal of Computer Science and Applied Informatics

Lead the way for us

Similar Papers

Machine Learning and Deep Learning Methods for Enhancing Building Energy Efficiency and Indoor Environmental Quality – A Review
Paige Wenbin Tien ... John Kaiser Calautit
Energy and AI | VOL. 10
Paige Wenbin Tien, et. al.Paige Wenbin Tien ... John Kaiser Calautit
08 Aug 2022
Energy and AI | VOL. 10

COMPARATIVE STUDY OF TRANSFORMER- AND LSTM-BASED MACHINE LEARNING METHODS FOR TRANSIENT THERMAL FIELD RECONSTRUCTION
Wiera Bielajewa ... Michelle Tindall
Computational Thermal Sciences: An International Journal | VOL. 16
Wiera Bielajewa, et. al.Wiera Bielajewa ... Michelle Tindall
01 Jan 2024
Computational Thermal Sciences: An International Journal | VOL. 16

Machine Learning Empowering Drug Discovery: Applications, Opportunities and Challenges.
Xin Qi ... Jiajia Chen
Molecules | VOL. 29
Xin Qi, et. al.Xin Qi ... Jiajia Chen
18 Feb 2024
Molecules | VOL. 29

Constraint-Aware Learning for Fractional Flow Reserve Pullback Curve Estimation from Invasive Coronary Imaging.
Dong Zhang ... Zhifan Gao
IEEE transactions on medical imaging | VOL. PP
Dong Zhang, et. al.Dong Zhang ... Zhifan Gao
01 Jan 2024
IEEE transactions on medical imaging | VOL. PP

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating Machine Learning Utility in Tabular Data Synthesizer Training using Loss Function Learning

Abstract

Talk to us

Similar Papers

More From: ILKOMNIKA: Journal of Computer Science and Applied Informatics