TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models

Weichuan Mo,Kongyang Chen,Yatie Xiao

doi:10.3390/math13020272

Weichuan Mo, Kongyang Chen + Show 1 more

Open Access

https://doi.org/10.3390/math13020272

Copy DOI

Export

Save

Cite

Journal: Mathematics	Publication Date: Jan 15, 2025
License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Pre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a promising solution for model ownership verification, but its application to NLP models presents unique challenges, particularly in ensuring robustness against fine-tuning and preventing interference with downstream tasks. This paper presents a novel watermarking scheme, TIBW (Task-Independent Backdoor Watermarking), that embeds robust, task-independent backdoor watermarks into pre-trained language models. By implementing a Trigger–Target Word Pair Search Algorithm that selects trigger–target word pairs with maximal semantic dissimilarity, our approach ensures that the watermark remains effective even after extensive fine-tuning. Additionally, we introduce Parameter Relationship Embedding (PRE) to subtly modify the model’s embedding layer, reinforcing the association between trigger and target words without degrading the model performance. We also design a comprehensive watermark verification process that evaluates task behavior consistency, quantified by the Watermark Embedding Success Rate (WESR). Our experiments across five benchmark NLP tasks demonstrate that the proposed watermarking method maintains a near-baseline performance on clean inputs while achieving a high WESR, outperforming existing baselines in both robustness and stealthiness. Furthermore, the watermark persists reliably even after additional fine-tuning, highlighting its resilience against potential watermark removal attempts. This work provides a secure and reliable IP protection mechanism for NLP models, ensuring watermark integrity across diverse applications.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models

Abstract

Published Version

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

A Study of Pre-trained Language Models in Natural Language Processing
Jiajia Duan ... Meikang Qiu
-
Jiajia Duan, et. al.Jiajia Duan ... Meikang Qiu
01 Nov 2020
01 Nov 2020

Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias
Anoop K ... Deepak P
-
Anoop K, et. al. Anoop K ... Deepak P
01 Jan 2021
01 Jan 2021

Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey
Bonan Min ... Dan Roth
ACM Computing Surveys | VOL. 56
Bonan Min, et. al.Bonan Min ... Dan Roth
14 Sep 2023
ACM Computing Surveys | VOL. 56

A Comparative Study of Deep Learning Models for Natural Language Processing (NLP)
-
JOURNAL OF ALGEBRAIC STATISTICS | VOL. -
--
01 Jan 2020
JOURNAL OF ALGEBRAIC STATISTICS | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models

Abstract

Published Version

Talk to us

Similar Papers

More From: Mathematics