The memory bandwidth has become a design bottleneck in a video coding system, especially for high-end display resolutions. Embedded compression (EC) can efficiently save the memory bandwidth between a video coding system and its frame memory. If the memory bandwidth is reduced, its memory access power can also be saved. The tremendous memory bandwidth exhibits a serious design challenge on the throughput of an EC engine. In this study, a hardware-efficient EC algorithm and architecture is proposed. It comprises three core techniques: adaptive weighting-average prediction (AWP), fraction embedding method (FEM), and partition-based binary coding (PBC). AWP can switch prediction weightings according to surrounding texture condition, producing the residuals of an <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$8 \times 8$ </tex-math></inline-formula> block. The residual consists of an integer part and a fraction part (FP). FEM can smartly embed the FP into its sign bit, improving coding efficiency. PBC adaptively partitions the residuals into groups without complex filtering and converts them to efficient codewords. This study has a competitive compression ratio (CR) of 2.39, resulting in a memory bandwidth saving by 58%. Its hardware architecture is realized in TSMC CMOS 0.18- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mu \text{m}$ </tex-math></inline-formula> technology, presenting smooth data scheduling and regular data paths. The throughputs of both encoding and decoding can reach 12.8 Gpixels/s. Compared with other lossless EC codecs, this study can demonstrate better overall performance of CR, energy utilization, and area organization.
Read full abstract