Abstract

Given the potential misuse of recent advances in synthetic text generation by language models (LMs), it is important to have the capacity to attribute authorship of synthetic text. While stylometric organic (i.e., human written) authorship attribution has been quite successful, it is unclear whether similar approaches can be used to attribute a synthetic text to its source LM. We address this question with the key insight that synthetic texts carry subtle distinguishing marks inherited from their source LM and that these marks can be leveraged by machine learning (ML) algorithms for attribution. We propose and test several ML-based attribution methods. Our best attributor built using a fine-tuned version of XLNet (XLNet-FT) consistently achieves excellent accuracy scores (91% to near perfect 98%) in terms of attributing the parent pre-trained LM behind a synthetic text. Our experiments show promising results across a range of experiments where the synthetic text may be generated using pre-trained LMs, fine-tuned LMs, or by varying text generation parameters.

Highlights

  • Recent advancements in natural language processing have enabled synthetic text generation that is often of comparable quality to the organic text (Ippolito et al, 2020; Radford et al, 2019; Zellers et al, 2019; Gehrmann et al, 2019)

  • While prior research has shown promise in distinguishing between synthetic and organic text, very little has been done on attributing the authorship of the language model (LM) generating the synthetic text (Pan et al, 2020)

  • These include attributors making use of stylometric features as well as static and dynamic embeddings. We evaluate these attributors on a corpus of 350,000 synthetic texts that we generated in a controlled manner using combinations of LMs, sampling parameters, and fine-tuning

Read more

Summary

Introduction

Recent advancements in natural language processing have enabled synthetic text generation that is often of comparable quality to the organic text (Ippolito et al, 2020; Radford et al, 2019; Zellers et al, 2019; Gehrmann et al, 2019). Variations in the sampling parameters used while generating synthetic text whether from pre-trained or fine-tuned LMs can further impact text characteristics (Zellers et al, 2019). We design and evaluate ML-based techniques for attributing the LM and configuration used to generate a synthetic text. We do this in the context of four problem scenarios, each representing a variation of a threat posed by an adversary or malicious user. Our key insight for attributing the LM used by the adversary is that differences between LM architecture (i.e., layers, parameters), training (i.e., pre-training and fine-tuning), and generation techniques (i.e., sampling parameters) will leave their subtle mark on the generated synthetic texts.

Threat Model
Attributing pre-trained LMs
Attributing fine-tuned LMs to parent pre-trained LMs
Attributing pre-trained or fine-tuned LMs with different sampling parameters
Attributing fine-tuned variants of a pre-trained LM
Text Generation
Text generation parameters
Data for fine-tuning
Dataset details
Attributors
CNN with GloVe embeddings
Attributors from LM embeddings
Attributing fine-tuned LMs to the parent pre-trained LMs
Attributing LM with different sampling parameters
Synthetic text attribution
Organic text attribution
Synthetic image attribution
Conclusion
Analysis of importance given by Decision Tree to Writeprints
Details of pre-trained language models used

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.