Mixing Activations and Labels in Distributed Training for Split Learning

Danyang Xiao,Weigang Wu,Chengang Yang

doi:10.1109/tpds.2021.3139191

Abstract

Split Learning (SL) is a distributed machine learning setting that allows several nodes to train neural networks based on model parallelism. Since SL avoids sharing raw data among training nodes, it can protect data privacy by nature. However, recent studies show that, raw data may be reconstructed from activations in training, which may cause data privacy leakage. Besides raw data, label sharing in SL may also cause privacy problems. In order to address these issues, we propose a novel mechanism called multiple activations and labels mix (MALM). By taking advantage of the diversity of sample categories, MALM generates mixed activations that preserve a low distance correlation with the raw data so as to reduce the risk of reconstruction attacks. To protect label information, MALM creates obfuscated labels associated with the raw data so as to prevent adversaries from inferring ground-truth labels. Since clients with few sample categories may not effectively generate mixed activations and obfuscated labels, we propose a bipartite graph based assistant client match technique for MALM, which lets clients with a large number of categories provide mixed activations and obfuscated labels for clients with few categories. Those clients with few categories can mix the obtained mixed activations and obfuscated labels with their own activations and labels. Experimental results show that, compared with baselines, MALM can reduce the risk of raw data and label information leakage with lower cost, while achieving comparable even better model performance.

Full Text