Abstract

Data parallelism and model parallelism are regarded as two major parallelism strategies for deep neural networks (DNNs). However, the two methodologies achieve acceleration mainly by applying coarse-grained network-model-based parallelization. Neither methodology can fully tap into the potentials of the parallelism of network models and many-core systems (such as GPUs). In this work, we propose a novel fine-grained parallelism strategy based on layer-wise parallelization (named FiLayer), which includes inter-layer parallelism and intra-layer parallelism. The former allows several adjacent layers in a network model to be processed in a pipelined manner. The latter divides the operations in one layer into several parts and processes them in parallel. CUDA streams are applied to realize the above fine-grained parallelisms. FiLayer is implemented by extending Caffe. Several typical datasets are used for the performance evaluation. The experimental results indicate that FiLayer can help Caffe achieve speedups of \(1.58{\times }\)–\(2.19{\times }\).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.