Abstract

In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.

Highlights

  • Sensor networks have important applications such as environmental monitoring, industrial machine monitoring, body area networks and military target tracking

  • The weak threshold w ensures the finite of the fault tolerant, α is calculated from the performance monitoring model and will change from time to time during the training of the distributed machine learning to ensure the dynamics of the fault tolerance

  • We propose the Stale Synchronous Parallel Strategy, named Dynamic Synchronous Parallel Strategy (DSP), based on the Dynamic Finite Fault Tolerance (DFFT)

Read more

Summary

Introduction

Sensor networks have important applications such as environmental monitoring, industrial machine monitoring, body area networks and military target tracking. In order to solve the problems existing in BSP, the distributed machine learning asynchronous iterative strategy has been proposed [12,13,14,15], in which worker nodes can start the iteration by using the local model parameters before receiving the global model parameters. In this strategy, the fault tolerance is increased, which leads to the machine learning model falling into local optima, and the accuracy may not be guaranteed. DSP dynamically adjusts the parameter synchronization strategy between worker nodes and PS based on the performance monitoring model, which effectively reduces the influence of the different performance of worker nodes and ensures the accuracy as well as the convergence rate

Related Work
The Distributed Machine Learning System
Parameter Server System
Robust Optimization
Communication Optimization
Theoretical Analysis
Problems of SSP
Improvements ofnode
Experimental Environment
The Finite of the Fault Tolerance
11. This figure compares the accuracyof ofthe themodel model for each parameter
The Dynamics of the Fault Tolerance
Conclusions
In Abstract

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.