Abstract

For the space-based remote sensing system, onboard intelligent processing based on deep learning has become an inevitable trend. To adapt to the dynamic changes of the observation scenes, there is an urgent need to perform distributed deep learning onboard to fully utilize the plentiful real-time sensing data of multiple satellites from a smart constellation. However, the network bandwidth of the smart constellation is very limited. Therefore, it is of great significance to carry out distributed training research in a low-bandwidth environment. This paper proposes a Randomized Decentralized Parallel Stochastic Gradient Descent (RD-PSGD) method for distributed training in a low-bandwidth network. To reduce the communication cost, each node in RD-PSGD just randomly transfers part of the information of the local intelligent model to its neighborhood. We further speed up the algorithm by optimizing the programming of random index generation and parameter extraction. For the first time, we theoretically analyze the convergence property of the proposed RD-PSGD and validate the advantage of this method by simulation experiments on various distributed training tasks for image classification on different benchmark datasets and deep learning network architectures. The results show that RD-PSGD can effectively save the time and bandwidth cost of distributed training and reduce the complexity of parameter selection compared with the TopK-based method. The method proposed in this paper provides a new perspective for the study of onboard intelligent processing, especially for online learning on a smart satellite constellation.

Highlights

  • With the breakthrough development of artificial intelligence and the rapid improvement of onboard computing and storage capabilities, it is an inevitable trend for remote sensing satellite systems to directly generate information required by users through intelligent processing onboard [1, 2]

  • Depending on how the tasks are parallelized across satellites, the distributed training can be divided into two categories: model parallelism and data parallelism [3]

  • We prove the convergence of Randomized Decentralized Parallel Stochastic Gradient Descent (RD-PSGD)

Read more

Summary

Introduction

With the breakthrough development of artificial intelligence and the rapid improvement of onboard computing and storage capabilities, it is an inevitable trend for remote sensing satellite systems to directly generate information required by users through intelligent processing onboard [1, 2]. Due to the particularity of the operating environment of the satellites, which is different from the cluster system on the ground, the network bandwidth of the smart constellation is often very limited It is of great significance and practical urgency to develop distributed deep learning research under a low-bandwidth environment. The decentralized network structure removes the central parameter server and allows all nodes to exchange parameters or gradients with adjacent nodes In this way, the pressure of communication can be shared with each node to avoid congestion and improve the real-time capability of distributed training. A novel method named RD-PSGD (Randomized Decentralized Parallel Stochastic Gradient Descent) for reducing communication bandwidth by parameter sparsification is proposed.

Methodology
Programming Optimization
Experiments
Conclusion and Future Work
Findings
Conflicts of Interest
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call