Byzantine Resilience Research Articles

Decentralized learning has gained great popularity to improve learning efficiency and preserve data privacy. Each computing node makes equal contribution to collaboratively learn a Deep Learning model. The elimination of centralized Parameter Servers (PS) can effectively address many issues such as privacy, performance bottleneck and single-point-failure. However, how to achieve Byzantine Fault Tolerance in decentralized learning systems is rarely explored, although this problem has been extensively studied in centralized systems. In this paper, we present an in-depth study towards the Byzantine resilience of decentralized learning systems with two contributions. First, from the adversarial perspective, we theoretically illustrate that Byzantine attacks are more dangerous and feasible in decentralized learning systems: even one malicious participant can arbitrarily alter the models of other participants by sending carefully crafted updates to its neighbors. Second, from the defense perspective, we propose Ubar, a novel algorithm to enhance decentralized learning with Byzantine Fault Tolerance. Specifically, Ubar provides a <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">U</b> niform <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">B</b> yzantine-resilient <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">A</b> ggregation <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R</b> ule for benign nodes to select the useful parameter updates and filter out the malicious ones in each training iteration. It guarantees that each benign node in a decentralized system can train a correct model under very strong Byzantine attacks with an arbitrary number of faulty nodes. We conduct extensive experiments on standard image classification tasks and the results indicate that Ubar can effectively defeat both simple and sophisticated Byzantine attacks with higher performance efficiency than existing solutions.

Machine learning (ML) solutions are nowadays distributed, according to the so-called server/worker architecture. One server holds the model parameters while several workers train the model. Clearly, such architecture is prone to various types of component failures, which can be all encompassed within the spectrum of a Byzantine behavior. Several approaches have been proposed recently to tolerate Byzantine workers. Yet all require trusting a central parameter server. We initiate in this paper the study of the “general” Byzantine-resilient distributed machine learning problem where no individual component is trusted. In particular, we distribute the parameter server computation on several nodes. We show that this problem can be solved in an asynchronous system, despite the presence of frac{1}{3} Byzantine parameter servers (i.e., n_{ps} > 3f_{ps}+1) and frac{1}{3} Byzantine workers (i.e., n_w > 3f_w), which is asymptotically optimal. We present a new algorithm, ByzSGD, which solves the general Byzantine-resilient distributed machine learning problem by relying on three major schemes. The first, scatter/gather, is a communication scheme whose goal is to bound the maximum drift among models on correct servers. The second, distributed median contraction (DMC), leverages the geometric properties of the median in high dimensional spaces to bring parameters within the correct servers back close to each other, ensuring safe and lively learning. The third, Minimum-diameter averaging (MDA), is a statistically-robust gradient aggregation rule whose goal is to tolerate Byzantine workers. MDA requires a loose bound on the variance of non-Byzantine gradient estimates, compared to existing alternatives [e.g., Krum (Blanchard et al., in: Neural information processing systems, pp 118-128, 2017)]. Interestingly, ByzSGD ensures Byzantine resilience without adding communication rounds (on a normal path), compared to vanilla non-Byzantine alternatives. ByzSGD requires, however, a larger number of messages which, we show, can be reduced if we assume synchrony. We implemented ByzSGD on top of both TensorFlow and PyTorch, and we report on our evaluation results. In particular, we show that ByzSGD guarantees convergence with around 32% overhead compared to vanilla SGD. Furthermore, we show that ByzSGD’s throughput overhead is 24–176% in the synchronous case and 28–220% in the asynchronous case.

Byzantine Resilience Research Articles

Related Topics

Articles published on Byzantine Resilience

Byzantine Fault-Tolerant Federated Learning Based on Trustworthy Data and Historical Information

Byzantine Machine Learning: A Primer

Sign-Based Gradient Descent With Heterogeneous Data: Convergence and Byzantine Resilience.

Collaborative Byzantine Resilient Federated Learning

Byzantine-Resilient Federated Learning at Edge

Practical Differentially Private and Byzantine-resilient Federated Learning

Byzantine Resilient Joint Localization and Target Tracking of Multi-Vehicle Systems

SPDL: A Blockchain-Enabled Secure and Privacy-Preserving Decentralized Learning System

Byzantine Resilient Distributed Learning in Multirobot Systems

SEAR: Secure and Efficient Aggregation for Byzantine-Robust Federated Learning

Byzantine-Resilient Decentralized Stochastic Gradient Descent

Genuinely distributed Byzantine machine learning

A simplified convergence theory for Byzantine resilient stochastic gradient descent

BRIDGE: Byzantine-Resilient Decentralized Gradient Descent

BCOOL: A Novel Blockchain Congestion Control Architecture Using Dynamic Service Function Chaining and Machine Learning for Next Generation Vehicular Networks

Byzantine Resilient Non-Convex SCSG With Distributed Batch Gradient Computations

A New Approach to Distributed Hypothesis Testing and Non-Bayesian Learning: Improved Learning Rate and Byzantine Resilience

Cross-Layer Metrics for Reliable Routing in Wireless Mesh Networks

Optimal Byzantine-resilient convergence in uni-dimensional robot networks

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Byzantine Resilience Research Articles

Related Topics

Articles published on Byzantine Resilience

Byzantine Fault-Tolerant Federated Learning Based on Trustworthy Data and Historical Information

Byzantine Machine Learning: A Primer

Sign-Based Gradient Descent With Heterogeneous Data: Convergence and Byzantine Resilience.

Collaborative Byzantine Resilient Federated Learning

Byzantine-Resilient Federated Learning at Edge

Practical Differentially Private and Byzantine-resilient Federated Learning

Byzantine Resilient Joint Localization and Target Tracking of Multi-Vehicle Systems

SPDL: A Blockchain-Enabled Secure and Privacy-Preserving Decentralized Learning System

Byzantine Resilient Distributed Learning in Multirobot Systems

SEAR: Secure and Efficient Aggregation for Byzantine-Robust Federated Learning

Byzantine-Resilient Decentralized Stochastic Gradient Descent

Genuinely distributed Byzantine machine learning

A simplified convergence theory for Byzantine resilient stochastic gradient descent

BRIDGE: Byzantine-Resilient Decentralized Gradient Descent

BCOOL: A Novel Blockchain Congestion Control Architecture Using Dynamic Service Function Chaining and Machine Learning for Next Generation Vehicular Networks

Byzantine Resilient Non-Convex SCSG With Distributed Batch Gradient Computations

A New Approach to Distributed Hypothesis Testing and Non-Bayesian Learning: Improved Learning Rate and Byzantine Resilience

Cross-Layer Metrics for Reliable Routing in Wireless Mesh Networks

Optimal Byzantine-resilient convergence in uni-dimensional robot networks