Analysis and architecture design of scalable fractional motion estimation for H.264 encoding

Jasmina Vasiljevic

doi:10.32920/ryerson.14656704.v1

Abstract

<p>FractionalMotion Estimation (FME) is an important part of the H.264/AVC video encoding standard. FME can significantly increase the compression ratio achievable by video encoders while improving video quality. However, it is computationally expensive and can consist of over 45% of the total motion estimation runtime. To maximize the performance and hardware utilization of FME implementations on Field-Programmable Gate Arrays (FGPAs), one needs to effectively exploit the inherent parallelism in an algorithm. In the work we explore two approaches to FME algorithm parallelization in order to effectively increase the processing power of the computing hardware. The first method is referred to as vertical scaling and the second horizontal scaling. In total, we implemented six scaled FME designs on a Xilinx Virtex-5 FPGA. We found that our best scaled FME design exhibited a speedup of 8x over the horizontally scaled designs. Additionally, we conclude that scaling vertically within 4x4 pixel sub-block is more efficient than scaling horizontally across several sub-blocks. As a result we were able to achieve higher video resolutions at lower resource costs. In particular, it is shown that the best vertically scaled design can achieve 30 fps of QSXGA (2560x2048) video using 4 reference frames with only 25.5L LUTS and 28.7K registers.</p>

Highlights

Introduction to VideoMany digital video formats exist, varying in their target resolutions, colour space, chrominance sub-sampling ratios and signal sampling frequencies
We introduced the basic building block of the Interpolation Engine (IE) which is the Finite Impulse Response (FIR) filter, and how it is implemented in order to create Horizontal Interpolation Units (H-IPU) and Vertical Interpolation Units (V-IPU)
In particular we show how vertical alignment is influenced by the number of clock cycles it takes for the Processing Units (PUs)‟s 2-D Hadamard Transform to process a set of 4x4 pixels

Summary

Introduction

Many digital video formats exist, varying in their target resolutions, colour space, chrominance sub-sampling ratios and signal sampling frequencies. The mathematical representation of a set of colours is called a colour space [27]. Popular models are RGB, used in computer graphics, YIQ, YUV or YCbCr, popular in video broadcasting systems, and CMYK, used in colour printing. The YCbCr format has one luminance (Y) and two chrominance components (Cb and Cr). The luminance and chrominance components are quantized as 8-bit data values. The 8-bit luminance value represents a single unsigned integer byte of data, ranging from 0 to 255, where 0 corresponds to a purely black pixel and 255 to a purely white pixel [27]

Objectives

Results

Conclusion