Abstract

Alternating Direction Methods of Multipliers (ADMM) has been proven to be a useful alternative to the popular gradient-based optimizers and successfully applied to train the DNN model. Whereas existing ADMM-based approaches generally do not achieve a good trade-off between the rapid convergence and fast training and do not support parallel DNN training with multiple GPUs as well. These drawbacks seriously hinder them from effectively training DNN models with modern GPU computing platforms which are always equipped with multiple GPUs. In this paper, we propose pdlADMM that can effectively train DNN in a data-parallel manner. The key insight of pdlADMM lies in that it explores efficient solutions for each sub-problem by comprehensively considering three main factors including computational complexity, convergence, and suitability to parallel computing. With more number of GPUs, pdlADMM remains rapid convergence and the computational complexity on each GPU tends to decline. Extensive experiments demonstrate the effectiveness of our proposal. Compared to the other two state state-of-the-art ADMM-based approaches, pdlADMM converges significantly faster, obtains better accuracy, and achieves very competitive training speed at the same time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call