Privacy-Preserving Distributed Multi-Task Learning against Inference Attack in Cloud Computing

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Because of the powerful computing and storage capability in cloud computing, machine learning as a service (MLaaS) has recently been valued by the organizations for machine learning training over some related representative datasets. When these datasets are collected from different organizations and have different distributions, multi-task learning (MTL) is usually used to improve the generalization performance by scheduling the related training tasks into the virtual machines in MLaaS and transferring the related knowledge between those tasks. However, because of concerns about privacy breaches (e.g., property inference attack and model inverse attack), organizations cannot directly outsource their training data to MLaaS or share their extracted knowledge in plaintext, especially the organizations in sensitive domains. In this article, we propose a novel privacy-preserving mechanism for distributed MTL, namely NOInfer, to allow several task nodes to train the model locally and transfer their shared knowledge privately. Specifically, we construct a single-server architecture to achieve the private MTL, which protects task nodes’ local data even if n-1 out of n nodes colluded. Then, a new protocol for the Alternating Direction Method of Multipliers (ADMM) is designed to perform the privacy-preserving model training, which resists the inference attack through the intermediate results and ensures that the training efficiency is independent of the number of training samples. When releasing the trained model, we also design a differentially private model releasing mechanism to resist the membership inference attack. Furthermore, we analyze the privacy preservation and efficiency of NOInfer in theory. Finally, we evaluate our NOInfer over two testing datasets and evaluation results demonstrate that NOInfer efficiently and effectively achieves the distributed MTL.

Similar Papers
  • Conference Article
  • Cite Count Icon 3
  • 10.1109/globecom46510.2021.9685603
Data Privacy Protection based on Feature Dilution in Cloud Services
  • Dec 1, 2021
  • Feng Wu + 5 more

Machine learning as a service (MLaaS) brings many benefits to people's daily life. However, the service mode of MLaaS will increase the risk of users' privacy leakage. Existing works focusing on privacy-preserving based on encryption, differential privacy, and distributed framework require high computing resources or cannot be applied in MLaaS. In this paper, we propose feature dilution (FD), a noise-based desensitization algorithm to remove sensitive information in raw data. In particular, FD continuously adds raw data features to the random noise until it meets the minimum amount for an effective query, and we call this noise weak-feature noise (WFN). By fine-tuning the MLaaS architecture, we have realized that users can utilize WFN to get normal services without exposing their local private data. Meanwhile, noise addition technology is introduced by us to reduce the risk of privacy leakage caused by “weak features”. Extensive experiments have demonstrated that users can use FD to obtain effective services without exposing their private data. Finally, we conducted practical tests on weak-feature noises and found that these noises are difficult to use by malicious service providers.

  • Research Article
  • 10.1016/j.ipm.2024.103947
Membership inference attacks via spatial projection-based relative information loss in MLaaS
  • Nov 4, 2024
  • Information Processing and Management
  • Zehua Ding + 5 more

Membership inference attacks via spatial projection-based relative information loss in MLaaS

  • Research Article
  • Cite Count Icon 36
  • 10.1109/tsusc.2019.2930526
MIASec: Enabling Data Indistinguishability Against Membership Inference Attacks in MLaaS
  • Jul 1, 2020
  • IEEE Transactions on Sustainable Computing
  • Chen Wang + 5 more

The emerging of machine learning has massively promoted the abilities of computational sustainability in natural resource management and allocation. Many Internet giants such as Google, Amazon, and Microsoft now provide Machine Learning as a Service (MLaaS) to meet the increasing demand for machine learning services. However, the prediction results of training data and testing data with the same machine learning model in MLaaS have remarkable differences, and thus the attackers can leverage machine learning techniques to launch the so-called membership inference attacks, i.e., to infer whether a record is in the training data or not. In this paper, we propose MIASec that can guarantee the data indistinguishability of the training data and thereby has the ability to defend against membership inference attacks in MLaaS. The key idea of MIASec is to narrow the dynamic ranges of vital features in the training data, such that the training data, the testing data, and even the synthetic data have almost semblable prediction results by the same machine learning model. With elaborated design on modifying the values of vital features in the training data, MIASec can thus reduce the differences between the model's outcomes of training data and testing data, thereby protecting the training data in effect while keeping the model's accuracy stable. We empirically evaluate MIASec on machine learning models trained by off-line neural networks and on-line MLaaS. Using realistic data and classification tasks, our experiment results show that MIASec can defend the membership inference attacks effectively. In particular, MIASec can reduce the precision and recall of attacks respectively by 11.7 and 15.4 percent in average, and by 18.6 and 21.8 percent at best.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.1155/2021/9924684
A Defense Framework for Privacy Risks in Remote Machine Learning Service
  • Jun 18, 2021
  • Security and Communication Networks
  • Yang Bai + 3 more

In recent years, machine learning approaches have been widely adopted for many applications, including classification. Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. The adversarial method as one of typical mitigation has been studied by several recent works. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly consider the data owner and the model provider as one role. Under this assumption, the privacy leakage risks from the curious server are neglected. Differential privacy methods can defend against privacy threats from both the curious sever and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily. In this work, we propose a generic privacy-preserving framework based on the adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. Then, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The experiments’ results show that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive data from direct content exposed attacks. In addition, our method can achieve better privacy and utility balance compared to the existing method.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/satml54575.2023.00045
PolyKervNets: Activation-free Neural Networks For Efficient Private Inference
  • Feb 1, 2023
  • Toluwani Aremu + 1 more

With the advent of cloud computing, machine learning as a service (MLaaS) has become a growing phenomenon with the potential to address many real-world problems. In an untrusted cloud environment, privacy concerns of users is a major impediment to the adoption of MLaaS. To alleviate these privacy issues and preserve data confidentiality, several private inference (PI) protocols have been proposed in recent years based on cryptographic tools like Fully Homomorphic Encryption (FHE) and Secure Multiparty Computation (MPC). Deep neural networks (DNN) have been the architecture of choice in most MLaaS deployments. One of the core challenges in developing PI protocols for DNN inference is the substantial costs involved in implementing non-linear activation layers such as Rectified Linear Unit (ReLU). This has spawned a search for accurate, but efficient approximations of the ReLU function and neural architectures that operate on a stringent ReL U budget. While these methods improve efficiency and ensure data confidentiality, they often come at a significant cost to prediction accuracy. In this work, we propose a DNN architecture based on polynomial kervolution called PolyKervNet (PKN), which completely eliminates the need for non-linear activation and max pooling layers. PolyKervNets are both FHE and MPC- friendly - they enable FHE- based encrypted inference without any approximations and improve the latency on MPC-based PI protocols without any use of garbled circuits. We demonstrate that it is possible to redesign standard convolutional neural networks (CNN) architectures such as ResNet-18 and VGG-16 with polynomial kervolution and achieve up to 30 x improvement in latency of MPC-based PI with minimal loss in accuracy on many image classification tasks.

  • Research Article
  • Cite Count Icon 23
  • 10.1016/j.neucom.2019.02.035
Multi-task feature selection with sparse regularization to extract common and task-specific features
  • Feb 22, 2019
  • Neurocomputing
  • Jiashuai Zhang + 3 more

Multi-task feature selection with sparse regularization to extract common and task-specific features

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/978-3-031-36402-0_28
Artificial Intelligence as a Service: Providing Integrity and Confidentiality
  • Jan 1, 2023
  • Neelima Guntupalli + 1 more

As Artificial Intelligence technologies are being vigorously used, there are major concerns about privacy, security and compression of the data. Bulk amounts of data are being stored in the cloud and the same will be transmitted to the parties that offer AI software services or platform services. The three key features that are to considered while transferring the data from cloud to Artificial Intelligence as a service (AIaaS) or Machine Learning as a service (MLaaS) are Data Compression, Data Integrity and Data Confidentiality. There is high demand for data processing which in turn is driving us to perform data compression. Data compression has to be done whether it is with Artificial Intelligence or Cloud Computing or Machine learning algorithms. Because without compressing the data such huge amounts of data whether it is text or multimedia application cannot be stored as it is. In this paper we have used an optimized lossless compression algorithm. When bulk amounts of data is being transferred from platform services to cloud, the foremost thing that has to be done is categorization of data i.e. the data that is critical and which needs integrity and the data that can be read by the users on the network. The critical data which needs AI services should be checked whether they are transmitted as it is. The data that is being sent by the cloud user should reach the service providing platforms without being modified. To maintain such integrity to the data, Hashing can be used. In this paper we have proposed a hashing algorithm that is implemented after performing data compression. The generated hash value is used as an attribute in generating keys for encryption.

  • Research Article
  • Cite Count Icon 68
  • 10.1109/tkde.2020.2969633
Secure and Efficient Outsourced k-Means Clustering using Fully Homomorphic Encryption With Ciphertext Packing Technique
  • Feb 7, 2020
  • IEEE Transactions on Knowledge and Data Engineering
  • Wei Wu + 4 more

Nowadays, more individuals and corporations tend to use machine learning as a service (MLaaS) in cloud computing environment. However, when enjoying the pay-as-you-go mode and flexible capacity of cloud computing, it also increases the risk of privacy leakage for sensitive data. In this paper, we aim to efficiently implement privacy-preserving MLaaS, and focus on k-means clustering over outsourced encrypted cloud databases. Previous works mainly utilize partially homomorphic encryptions, which require a great number of interactive protocols with high computation and communication costs, making them not practical in real-world applications. To better solve this problem, we propose a new secure and efficient outsourced k-means clustering (SEOKC) scheme using fully homomorphic encryption with ciphertext packing technique, which achieves parallel computation without extra cost. The proposed scheme preserves privacy in three aspects: (1) database security, (2) privacy of clustering results and (3) hiding of data access patterns. We provide formal security analysis and evaluate the performance of the proposed scheme through extensive experiments. The experiment results show that our scheme needs much less computation cost (more than three orders of magnitude lower) than the state-of-the-art schemes, and is suitable to be applied on large databases.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/sagc50777.2020.00014
Distributed Task Offloading and Resource Allocation in Vehicular Edge Computing
  • Dec 1, 2020
  • Shichao Li + 3 more

With the powerful storage and computation capability, vehicular edge computing (VEC) is considered as a promising paradigm to enhance the safety and service quality of vehicles in intelligent transportation systems (ITS). In this paper, we formulate a joint road side units (RSUs) selection and resource allocation problem, which minimizes the total task offloading delay subject to the bandwidth and computation resources constraints in VEC system. Considering the formulated problem is a mixed-integer nonlinear programming (MINLP) problem, the original problem is reformulated as a convex problem. Due to the high complexity of the problem, we decompose it into a distributed manner. By utilizing the alternating direction method of multipliers (ADMM), a joint RSUs selection and resource allocation (JRSRA) algorithm is proposed with low complexity. Simulation results shows that the proposed JRSRA algorithm can reduce the total task offloading delay compared with other benchmark methods.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/ciss56502.2023.10089781
AI/ML Systems Engineering Workbench Framework
  • Mar 22, 2023
  • Kofi Nyarko + 3 more

This paper presents the framework of a cloud-based Artificial Intelligence (AI) and Machine Learning (ML) workbench that provides services utilization and performance benchmarking. The framework promotes convenience by enabling a centralized platform for software developers and data scientists to perform federated search across various dataset repositories, choose problem domains, like Natural Language Processing, Speech and Computer Vision, and build/validate models. The benchmarking functionality of this framework helps users evaluate and compare performances of various solutions from multiple cloud service providers. The workbench framework consists of two primary layers. The Services layer which is rendered as an AI as a service (AIaaS) model, providing interfaces that connect users to vision, speech and natural language processing (NLP) services from various AI service providers. The Platform layer is an ML as a Service (MLaaS) model providing access to ML model training, tuning, inference and transfer learning tasks that are fulfillable on multiple cloud ML platforms with preset cloud-based compute instances. Benchmarking is provided on the workbench by comparing accuracy metrics on prediction and detection counts, F1 scores and ML training instances setup and completion time. By utilizing these performance benchmarking features, this framework can assist AI and ML practitioners in making informed judgments when selecting a cloud provider for specific activities. Additionally, it will increase the effectiveness and efficiency of data science training for both teachers and students.

  • Conference Article
  • Cite Count Icon 827
  • 10.14722/ndss.2019.23119
ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models
  • Jan 1, 2019
  • Ahmed Salem + 5 more

Machine learning (ML) has become a core component of many real-world applications and training data is a key factor that drives current progress. This huge success has led Internet companies to deploy machine learning as a service (MLaaS). Recently, the first membership inference attack has shown that extraction of information on the training set is possible in such MLaaS settings, which has severe security and privacy implications. However, the early demonstrations of the feasibility of such attacks have many assumptions on the adversary, such as using multiple so-called shadow models, knowledge of the target model structure, and having a dataset from the same distribution as the target model's training data. We relax all these key assumptions, thereby showing that such attacks are very broadly applicable at low cost and thereby pose a more severe risk than previously thought. We present the most comprehensive study so far on this emerging and developing threat using eight diverse datasets which show the viability of the proposed attacks across domains. In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.

  • Research Article
  • Cite Count Icon 5
  • 10.3390/e24050643
Privacy-Preserving Image Template Sharing Using Contrastive Learning
  • May 3, 2022
  • Entropy
  • Shideh Rezaeifar + 3 more

With the recent developments of Machine Learning as a Service (MLaaS), various privacy concerns have been raised. Having access to the user’s data, an adversary can design attacks with different objectives, namely, reconstruction or attribute inference attacks. In this paper, we propose two different training frameworks for an image classification task while preserving user data privacy against the two aforementioned attacks. In both frameworks, an encoder is trained with contrastive loss, providing a superior utility-privacy trade-off. In the reconstruction attack scenario, a supervised contrastive loss was employed to provide maximal discrimination for the targeted classification task. The encoded features are further perturbed using the obfuscator module to remove all redundant information. Moreover, the obfuscator module is jointly trained with a classifier to minimize the correlation between private feature representation and original data while retaining the model utility for the classification. For the attribute inference attack, we aim to provide a representation of data that is independent of the sensitive attribute. Therefore, the encoder is trained with supervised and private contrastive loss. Furthermore, an obfuscator module is trained in an adversarial manner to preserve the privacy of sensitive attributes while maintaining the classification performance on the target attribute. The reported results on the CelebA dataset validate the effectiveness of the proposed frameworks.

  • Research Article
  • Cite Count Icon 18
  • 10.1016/j.compbiomed.2021.105090
Dual feature correlation guided multi-task learning for Alzheimer's disease prediction
  • Dec 1, 2021
  • Computers in Biology and Medicine
  • Shanshan Tang + 4 more

Dual feature correlation guided multi-task learning for Alzheimer's disease prediction

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.compbiomed.2022.106367
Exploiting task relationships for Alzheimer’s disease cognitive score prediction via multi-task learning
  • Dec 7, 2022
  • Computers in Biology and Medicine
  • Wei Liang + 5 more

Exploiting task relationships for Alzheimer’s disease cognitive score prediction via multi-task learning

  • Conference Article
  • Cite Count Icon 10
  • 10.1145/3340531.3411860
Towards Plausible Differentially Private ADMM Based Distributed Machine Learning
  • Oct 19, 2020
  • Jiahao Ding + 4 more

The Alternating Direction Method of Multipliers (ADMM) and its distributed version have been widely used in machine learning. In the iterations of ADMM, model updates using local private data and model exchanges among agents impose critical privacy concerns. Despite some pioneering works to relieve such concerns, differentially private ADMM still confronts many research challenges. For example, the guarantee of differential privacy (DP) relies on the premise that the optimality of each local problem can be perfectly attained in each ADMM iteration, which may never happen in practice. The model trained by DP ADMM may have low prediction accuracy. In this paper, we address these concerns by proposing a novel (Improved) Plausible differentially Private ADMM algorithm, called PP-ADMM and IPP-ADMM. In PP-ADMM, each agent approximately solves a perturbed optimization problem that is formulated from its local private data in an iteration, and then perturbs the approximate solution with Gaussian noise to provide the DP guarantee. To further improve the model accuracy and convergence, an improved version IPP-ADMM adopts sparse vector technique (SVT) to determine if an agent should update its neighbors with the current perturbed solution. The agent calculates the difference of the current solution from that in the last iteration, and if the difference is larger than a threshold, it passes the solution to neighbors; or otherwise the solution will be discarded. Moreover, we propose to track the total privacy loss under the zero-concentrated DP (zCDP) and provide a generalization performance analysis. Experiments on real-world datasets demonstrate that under the same privacy guarantee, the proposed algorithms are superior to the state of the art in terms of model accuracy and convergence rate.

Save Icon
Up Arrow
Open/Close