Abstract

As an emerging digital product, artificial intelligence models face the risk of being modified. Malicious tampering will severely damage model functions, which is different from normal modifications. In addition, tampering localization for targeted repair can effectively reduce the cost. Therefore, it is crucial to achieve model content authentication and locate the tampering location. We proposed a novel semi-fragile neural network watermarking method in this paper to address these issues. Specifically, with the precondition of maintaining model performance, we proposed a method that generates a set of semi-fragile samples for a model to achieve the content authentication and tampering localization. The experiment results show that the content authentication of the model can be achieved by analyzing the output results of the model for the semi-fragile samples. When the model is processed normally, the output results are consistent with the expected label, while when the model is maliciously tampered with, the model produces unstable output. Furthermore, the tamper localization of the model can be further achieved through the information hidden in the semi-fragile samples, resulting in an average accuracy of more than 99.42%. In addition, our method is also effective for other deep neural networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call