Abstract

In a model inversion attack, an adversary attempts to reconstruct the training data records of a target model using only the model’s output. In launching a contemporary model inversion attack, the strategies discussed are generally based on either predicted confidence score vectors, i.e., black-box attacks, or the parameters of a target model, i.e., white-box attacks. However, in the real world, model owners usually only give out the predicted labels; the confidence score vectors and model parameters are hidden as a defense mechanism to prevent such attacks. Unfortunately, we have found a model inversion method that can reconstruct representative samples of the target model’s training data based only on the output labels. We believe this attack requires the least information to succeed and, therefore, has the best applicability. The key idea is to exploit the error rate of the target model to compute the median distance from a set of data records to the decision boundary of the target model. The distance is then used to generate confidence score vectors which are adopted to train an attack model to reconstruct the representative samples. The experimental results show that highly recognizable representative samples can be reconstructed with far less information than existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call