Abstract

Natural language generation (NLG) models have attracted extensive attention and applications due to their combination with powerful deep learning techniques. Many NLG models are encapsulated in cloud APIs serving commercial organizations, which have become very important profitable services for these institutions. However, cloud platforms may suffer from model extraction attacks that aim to imitate the functionality of these NLG models in practical applications, thus infringing the intellectual property (IP) of the NLG APIs. Unfortunately, most current watermarking methods for protecting deep model IP are not directly applicable to IP protection of NLG APIs. In addition, the semantic similarity between the watermarked texts generated by the baseline method and the original texts is not high enough, which can be easily detected by attackers. To make up these gaps, we propose a novel watermarking framework which embeds watermarks by conducting lexical modification to the outputs of the NLG models, and uses the corresponding watermark identification method can identify the attackers and protect the IP of NLG APIs. Experiment result shows that our proposed watermarking method not only generates watermarked texts with higher semantically similar to the original texts but also achieves better identifiable performance compared with the baseline method. In addition, our watermarking method also exhibits outstanding performance in other aspects such as transferability, watermark undetectability and robustness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call