Parallel Computing in DNNs Using CPU and MIC

Sijiang Fan,Zhiying Wang,Shiqing Zhang,Li Shen,Jiawei Fei,Ximing He

doi:10.1109/ispa/iucc.2017.00102

Abstract

Acceleration for the training process of Deep Neural Networks (DNNs) has been the focus of deep learning field. There were many researches of accelerating deep learning on different platforms. Among them, Intel Xeon Phi Coprocessor is a many-core platform which provides both strong programmability and high performance. But previous work about Intel Many Integrated Core (MIC) focused on parallel computing only in MIC. In this paper, we speed up the training process of DNNs applied for automatic speech recognition with CPU+MIC architecture. In this architecture, the training process of DNNs is executed both on MIC and CPU. We apply several optimization methods for I/O and calculation and set up experiments to approve these methods. Putting all methods together, results show that our optimized algorithm acquires about 20x speedup compared with the original sequential algorithm on CPU which uses one core.

Full Text