Abstract
Current microprocessors contain SIMD execution units (also called multimedia or vector extensions) that allow the data-parallel execution of operations on several subwords packed in 64-bit or 128-bit registers. They can accelerate not only typical multimedia applications but also many other algorithms based on vector and matrix operations. In this paper the results of a detailed experimental study about the suitability of such units for the fast simulation of neural networks are presented. It is shown that a speedup in the range from 2.0 to 8.6 compared to sequential implementations can be achieved. A performance counter analysis is provided to explain several effects by features of the processor architecture.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.