Through the Looking Glass: Learning to Attribute Synthetic Text Generated by Language Models

Shaoor Munir,Brishna Batool,Fareed Zaffar,Zubair Shafiq,Padmini Srinivasan

doi:10.18653/v1/2021.eacl-main.155

Abstract

Given the potential misuse of recent advances in synthetic text generation by language models (LMs), it is important to have the capacity to attribute authorship of synthetic text. While stylometric organic (i.e., human written) authorship attribution has been quite successful, it is unclear whether similar approaches can be used to attribute a synthetic text to its source LM. We address this question with the key insight that synthetic texts carry subtle distinguishing marks inherited from their source LM and that these marks can be leveraged by machine learning (ML) algorithms for attribution. We propose and test several ML-based attribution methods. Our best attributor built using a fine-tuned version of XLNet (XLNet-FT) consistently achieves excellent accuracy scores (91% to near perfect 98%) in terms of attributing the parent pre-trained LM behind a synthetic text. Our experiments show promising results across a range of experiments where the synthetic text may be generated using pre-trained LMs, fine-tuned LMs, or by varying text generation parameters.

Highlights

Recent advancements in natural language processing have enabled synthetic text generation that is often of comparable quality to the organic text (Ippolito et al, 2020; Radford et al, 2019; Zellers et al, 2019; Gehrmann et al, 2019)
While prior research has shown promise in distinguishing between synthetic and organic text, very little has been done on attributing the authorship of the language model (LM) generating the synthetic text (Pan et al, 2020)
These include attributors making use of stylometric features as well as static and dynamic embeddings. We evaluate these attributors on a corpus of 350,000 synthetic texts that we generated in a controlled manner using combinations of LMs, sampling parameters, and fine-tuning

Summary

Introduction

Recent advancements in natural language processing have enabled synthetic text generation that is often of comparable quality to the organic text (Ippolito et al, 2020; Radford et al, 2019; Zellers et al, 2019; Gehrmann et al, 2019). Variations in the sampling parameters used while generating synthetic text whether from pre-trained or fine-tuned LMs can further impact text characteristics (Zellers et al, 2019). We design and evaluate ML-based techniques for attributing the LM and configuration used to generate a synthetic text. We do this in the context of four problem scenarios, each representing a variation of a threat posed by an adversary or malicious user. Our key insight for attributing the LM used by the adversary is that differences between LM architecture (i.e., layers, parameters), training (i.e., pre-training and fine-tuning), and generation techniques (i.e., sampling parameters) will leave their subtle mark on the generated synthetic texts.

Threat Model

Attributing pre-trained LMs

Attributing fine-tuned LMs to parent pre-trained LMs

Attributing pre-trained or fine-tuned LMs with different sampling parameters

Attributing fine-tuned variants of a pre-trained LM

Text Generation

Text generation parameters

Data for fine-tuning

Dataset details

Attributors

CNN with GloVe embeddings

Attributors from LM embeddings

Attributing fine-tuned LMs to the parent pre-trained LMs

Attributing LM with different sampling parameters

Synthetic text attribution

Organic text attribution

Synthetic image attribution

Conclusion

Analysis of importance given by Decision Tree to Writeprints

Details of pre-trained language models used

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Through the Looking Glass: Learning to Attribute Synthetic Text Generated by Language Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 4	License type: cc-by

Similar Papers

Do Language Models Plagiarize?
Jooyoung Lee ... Thai Le
-
Jooyoung Lee, et. al.Jooyoung Lee ... Thai Le
30 Apr 2023
30 Apr 2023

Are You Robert or RoBERTa? Deceiving Online Authorship Attribution Models Using Neural Text Generators
Keenan Jones ... Shujun Li
Proceedings of the International AAAI Conference on Web and Social Media | VOL. 16
Keenan Jones, et. al.Keenan Jones ... Shujun Li
31 May 2022
Proceedings of the International AAAI Conference on Web and Social Media | VOL. 16

A Trade-off between ML and DL Techniques in Natural Language Processing
Bhavesh Singh ... Himanshu Ashar
Journal of Physics: Conference Series | VOL. 1831
Bhavesh Singh, et. al.Bhavesh Singh ... Himanshu Ashar
01 Mar 2021
Journal of Physics: Conference Series | VOL. 1831

Fine-Tuning Language Models For Semi-Supervised Text Mining
Xinyu Chen ... Ian Beaver
-
Xinyu Chen, et. al.Xinyu Chen ... Ian Beaver
10 Dec 2020
10 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Through the Looking Glass: Learning to Attribute Synthetic Text Generated by Language Models

Abstract

Highlights

Summary

Talk to us

Similar Papers