High-Throughput Deblocking Filter Architecture Using Quad Parallel Edge Filter for H.264 Video Coding Systems

Prayline Rajabai C,S Sivanantham

doi:10.1109/access.2019.2930149

Prayline Rajabai C, S Sivanantham

Open Access

https://doi.org/10.1109/access.2019.2930149

Copy DOI

Abstract

With the increasing demand in electronic gadgets expecting better video quality for multimedia applications, various coding standards evolved for the past two decades and optimization on the architectures of the various modules used in the video codec is most popular. In this paper, an efficient architecture for deblocking filter used to smoothen the pixels of the decompressed video data is proposed, which utilizes both pipelining and parallelism. The filtering process follows a sequential order as filtering vertical edges of luma block and chroma block followed by the horizontal edges of the luma block and chroma block. Three pipeline stages are used and four edges, either vertical or horizontal are filtered in parallel. Internal buffers which hold the sub-blocks read from the external frame buffers are accessed in a ping pong fashion to filter the adjacent sub-edges and thus reducing the external memory access cycles. Due to parallelism with novel edge filtering order, self-transposing mechanism, and ping pong buffer access, the throughput is increased. The proposed quad parallel edge deblocking filter architecture is implemented using Synopsys 90 nm library. It achieves a target area of 19.8 K and can process a Macro Block in 58 clock cycles.

Highlights

Repercussions of emerging trends and advancements in the field of video technology and the electronics industry for the past two decades increased the amount of image/video data produced from the still-image/video camera
The control unit controls five different operations performed in three-pipeline stages where the memory read and boundary strength calculation happens at the initial stage, filter decision and filtering of the sub-edges are performed at the second stage and the memory write happens at the third stage
This paper presents the DBF algorithms and hardware architectures for H.264/AVC video codec

Summary

INTRODUCTION

Repercussions of emerging trends and advancements in the field of video technology and the electronics industry for the past two decades increased the amount of image/video data produced from the still-image/video camera. In [9], a memory-efficient architecture is implemented where the system throughput is improved by using a hybrid filtering order and pixel reuse This architecture utilizes two single-port SRAM of size 96×32 and 2N×32 to store the current block and the neighboring data (N represents the width of the coded frame). The edges of each 4×4 block are scheduled in horizontalvertical interleaved fashion and uses a transpose memory to transpose the pixel data when the filtering edge changes either from horizontal to vertical or from vertical to horizontal This architecture consumes 300 clock cycles/MB with the area of 13.41K in .25μm technology excluding the dual-port RAM of size 16×32. A novel filter architecture is implemented in [2] which uses 6-stage pipeline architecture, which can filter four edges in parallel and can process a MB in 64 clock cycles

QUAD PARALLEL EDGE DEBLOCKING FILTER ARCHITECTURE

OPERATION

RESULTS

CONCLUSION