Abstract

Acceleration for the training process of Deep Neural Networks (DNNs) has been the focus of deep learning field. There were many researches of accelerating deep learning on different platforms. Among them, Intel Xeon Phi Coprocessor is a many-core platform which provides both strong programmability and high performance. But previous work about Intel Many Integrated Core (MIC) focused on parallel computing only in MIC. In this paper, we speed up the training process of DNNs applied for automatic speech recognition with CPU+MIC architecture. In this architecture, the training process of DNNs is executed both on MIC and CPU. We apply several optimization methods for I/O and calculation and set up experiments to approve these methods. Putting all methods together, results show that our optimized algorithm acquires about 20x speedup compared with the original sequential algorithm on CPU which uses one core.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.