Abstract

This paper analyses the performance of the state-of-the-art media ISA (instruction set architecture) extensions in a general-purpose processor, when executing a video encoder based on an affine motion model. In addition to SIMD (single instruction multiple data) fixed-point instructions, these ISA extensions include SIMD floating-point instructions, special-purpose SIMD fixed-point instructions, and cacheability control instructions. In this study, eight time-consuming kernels of the video encoder were hand-optimized, using instructions in all four instruction categories of these media ISA extensions (the FLP version). These kernels were also hand-optimized using only SIMD fixed-point ISA extensions, without special-purpose instructions (the FXP version). For the FLP version, this study resulted in an average kernel-level speedup of 1.37X and an application-level speedup of 1.11X, compared to the FXP version, and an application-level speedup of 3.41X, compared to the C version.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call