Abstract
In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.
Highlights
Sensor networks have important applications such as environmental monitoring, industrial machine monitoring, body area networks and military target tracking
The weak threshold w ensures the finite of the fault tolerant, α is calculated from the performance monitoring model and will change from time to time during the training of the distributed machine learning to ensure the dynamics of the fault tolerance
We propose the Stale Synchronous Parallel Strategy, named Dynamic Synchronous Parallel Strategy (DSP), based on the Dynamic Finite Fault Tolerance (DFFT)
Summary
Sensor networks have important applications such as environmental monitoring, industrial machine monitoring, body area networks and military target tracking. In order to solve the problems existing in BSP, the distributed machine learning asynchronous iterative strategy has been proposed [12,13,14,15], in which worker nodes can start the iteration by using the local model parameters before receiving the global model parameters. In this strategy, the fault tolerance is increased, which leads to the machine learning model falling into local optima, and the accuracy may not be guaranteed. DSP dynamically adjusts the parameter synchronization strategy between worker nodes and PS based on the performance monitoring model, which effectively reduces the influence of the different performance of worker nodes and ensures the accuracy as well as the convergence rate
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.