Tracing Text Provenance via Context-Aware Lexical Substitution

Xi Yang,Kejiang Chen,Jie Zhang,Weiming Zhang,Nenghai Yu,Feng Wang,Zehua Ma

doi:10.1609/aaai.v36i10.21415

Abstract

Text content created by humans or language models is often stolen or misused by adversaries. Tracing text provenance can help claim the ownership of text content or identify the malicious users who distribute misleading content like machine-generated fake news. There have been some attempts to achieve this, mainly based on watermarking techniques. Specifically, traditional text watermarking methods embed watermarks by slightly altering text format like line spacing and font, which, however, are fragile to cross-media transmissions like OCR. Considering this, natural language watermarking methods represent watermarks by replacing words in original sentences with synonyms from handcrafted lexical resources (e.g., WordNet), but they do not consider the substitution’s impact on the overall sentence's meaning. Recently, a transformer-based network was proposed to embed watermarks by modifying the unobtrusive words (e.g., function words), which also impair the sentence's logical and semantic coherence. Besides, one well-trained network fails on other different types of text content. To address the limitations mentioned above, we propose a natural language watermarking scheme based on context-aware lexical substitution (LS). Specifically, we employ BERT to suggest LS candidates by inferring the semantic relatedness between the candidates and the original sentence. Based on this, a selection strategy in terms of synchronicity and substitutability is further designed to test whether a word is exactly suitable for carrying the watermark signal. Extensive experiments demonstrate that, under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences and has a better transferability than existing methods. Besides, the proposed LS approach outperforms the state-of-the-art approach on the Stanford Word Substitution Benchmark.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tracing Text Provenance via Context-Aware Lexical Substitution

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 9

Similar Papers

Natural Language Watermarking by Morpheme Segmentation
Mi-Young Kim
-
Mi-Young KimMi-Young Kim
01 Apr 2009
01 Apr 2009

Natural language watermarking via paraphraser-based lexical substitution
Jipeng Qiang ... Xindong Wu
Artificial Intelligence | VOL. 317
Jipeng Qiang, et. al.Jipeng Qiang ... Xindong Wu
16 Jan 2023
Artificial Intelligence | VOL. 317

GeneSis: A Generative Approach to Substitutes in Context
...
-
, et. al. ...
15 Oct 2021
15 Oct 2021

GeneSis: A Generative Approach to Substitutes in Context
Caterina Lacerra ... Roberto Navigli
-
Caterina Lacerra, et. al.Caterina Lacerra ... Roberto Navigli
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tracing Text Provenance via Context-Aware Lexical Substitution

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence