Abstract
This paper analyses the performance of the state-of-the-art media ISA (instruction set architecture) extensions in a general-purpose processor, when executing a video encoder based on an affine motion model. In addition to SIMD (single instruction multiple data) fixed-point instructions, these ISA extensions include SIMD floating-point instructions, special-purpose SIMD fixed-point instructions, and cacheability control instructions. In this study, eight time-consuming kernels of the video encoder were hand-optimized, using instructions in all four instruction categories of these media ISA extensions (the FLP version). These kernels were also hand-optimized using only SIMD fixed-point ISA extensions, without special-purpose instructions (the FXP version). For the FLP version, this study resulted in an average kernel-level speedup of 1.37X and an application-level speedup of 1.11X, compared to the FXP version, and an application-level speedup of 3.41X, compared to the C version.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.