Abstract

Convolutional neural network (CNN) extracts features from big data by using the multilayer network structure. Due to the high effectiveness, CNN has achieved great successes in many fields such as computer vision and speech analysis. However, CNN training is quite challenging because computing the gradients through multiple layers is time consuming. In this paper, we propose to accelerate the computation of gradients in the convolutional layer by CPU+MIC heterogeneous computing technique. In particular, we evaluate the time costs of computing the gradients of all layers in the Caffe framework, and found that the convolutional layer occupies the overall computational overheads. Based on this observation, we implement the intensive matrix manipulations of convolutional layers on the MIC coprocessor with OpenMP in the Caffe framework. To fully utilize the threads provided by MIC, we set two types of threads including data threads and MKL threads, and give a thread setting strategy with both theoretical and empirical analysis. We evaluate our acceleration method on several typical CNN models on the ImageNet dataset, and show that it speedups the computation of convolutional layer by about 6.8 times, and speedups the computation of overall training by about 5.8 times compared with those performed on single CPU.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.