Abstract

With growing scale of the data volume and neural network size, we have come into the era of distributed deep learning. High-performance training and inference on distributed computing systems has been attracting increasing research attention in both academia and industry. Meanwhile, diversity of existing machine learning frameworks (e.g. TensorFlow, Pytorch and MXNet) and the explosion of deep learning hardwares (e.g. CPUs, GPUs, FPGAs and ASICs) bring more challenges for users to leverage new deep learning technologies and accelerating capability of hardware devices. We firstly search around the state-of-the-art work in the area which open our mind to take a vision upon the future deep learning framework. Then, we propose HPDL, a general framework for high-performance distributed deep learning which is compatible with existing frameworks and adaptive to various hardware architectures. At last, we discuss and foresee the key technologies fulfilling high-performance and large-scale deep learning, including optimization algorithm, hybrid communication mechanism, model parallelization, resource scheduling and single-node execution optimization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.