Abstract

We studied an efficient software implementation of H.264/AVC sub-pixel motion estimation (ME) algorithm on a VLIW-SIMD digital signal processor, TMS320C6416. The sub-pixel ME algorithm demands large memory accesses while the required arithmetic operations are fairly simple. Although the CPU clock cycles for arithmetic operations can be reduced much by employing sub-word operations and applying software pipelining techniques, the limited memory bandwidth of the architecture restricts the overall performance. Moreover, aggressive VLIW-SIMD optimization results in the degradation of the performance by causing excessive CPU stalls during memory accesses. In this paper, we relieved the memory bandwidth requirements for creating quarter-pixel images by reducing the precision of image data, from 8 bits to 4 bits. As a result, the amount of memory accesses is much reduced at the cost of some increase of the arithmetic operations, which contributes to the balance of arithmetic and memory access operations. The experimental result shows that the memory stall cycles are decreased by 80% and the speed-up of 260% is obtained. The bit rate of the encoded video stream is increased slightly, about 2% on the average, due to the effects of quantization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call