Abstract
As machine learning models become increasingly integrated into practical applications and are made accessible via public APIs, the risk of model extraction attacks has gained prominence. This study presents an innovative and efficient approach to model extraction attacks, aimed at reducing query costs and enhancing attack effectiveness. The method begins by leveraging a pre-trained model to identify high-confidence samples from unlabeled datasets. It then employs unsupervised contrastive learning to thoroughly dissect the structural nuances of these samples, constructing a dataset of high quality that precisely mirrors a variety of features. A mixed information confidence strategy is employed to refine the query set, effectively probing the decision boundaries of the target model. By integrating consistency regularization and pseudo-labeling techniques, reliance on authentic labels is minimized, thus improving the feature extraction capabilities and predictive precision of the surrogate models. Evaluation on four major datasets reveals that the models crafted through this method bear a close functional resemblance to the original models, with a real-world API test success rate of 62.35%, which vouches for the method’s validity.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.