Data Isotopes for Data Provenance in DNNs

Emily Wenger,Ben Y Zhao,Xiuyu Li,Vitaly Shmatikov

doi:10.56553/popets-2024-0024

Abstract

Today, creators of data-hungry deep neural networks (DNNs) scour the Internet for training fodder, leaving users with little control over or knowledge of when their data, and in particular their images, are used to train models. To empower users to counteract unwanted use of their images, we design, implement and evaluate a practical system that enables users to detect if their data was used to train a DNN model for image classification. We show how users can create special images we call isotopes, which introduce ``spurious features'' into DNNs during training. With only query access to a model and no knowledge of the model-training process, nor control of the data labels, a user can apply statistical hypothesis testing to detect if the model learned these spurious features by training on the user's images. Isotopes can be viewed as an application of a particular type of data poisoning. In contrast to backdoors and other poisoning attacks, our purpose is not to cause misclassification but rather to create tell-tale changes in confidence scores output by the model that reveal the presence of isotopes in the training data. Isotopes thus turn DNNs' vulnerability to memorization and spurious correlations into a tool for data provenance. Our results confirm efficacy in multiple image classification settings, detecting and distinguishing between hundreds of isotopes with high accuracy. We further show that our system works on public ML-as-a-service platforms and larger models such as ImageNet, can use physical objects in images instead of digital marks, and remains robust against several adaptive countermeasures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data Isotopes for Data Provenance in DNNs

Abstract

Talk to us

Similar Papers

More From: Proceedings on Privacy Enhancing Technologies

Lead the way for us

Journal: Proceedings on Privacy Enhancing Technologies	Publication Date: Jan 1, 2024
License type: CC BY 4.0

Similar Papers

On the Impact of Spurious Correlation for Out-of-Distribution Detection
Yifei Ming ... Yixuan Li
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Yifei Ming, et. al.Yifei Ming ... Yixuan Li
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

Mitigating Spurious Correlations for Self-supervised Recommendation
Xin-Yu Lin ... Fu-Li Feng
Machine Intelligence Research | VOL. 20
Xin-Yu Lin, et. al.Xin-Yu Lin ... Fu-Li Feng
14 Jan 2023
Machine Intelligence Research | VOL. 20

Performance of deep learning models for classifying and detecting common weeds in corn and soybean production systems
Aanis Ahmad ... Benjamin Hancock
Computers and Electronics in Agriculture | VOL. 184
Aanis Ahmad, et. al.Aanis Ahmad ... Benjamin Hancock
13 Mar 2021
Computers and Electronics in Agriculture | VOL. 184

Local visual pattern modelling for image and video classification
Peng Wang
-
Peng WangPeng Wang
21 Apr 2017
21 Apr 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data Isotopes for Data Provenance in DNNs

Abstract

Talk to us

Similar Papers

More From: Proceedings on Privacy Enhancing Technologies