Abstract

Low power consumption is an important requirement on battery-limited systems, like mobile devices. Many applications on the mobile devices require high computation and high power consumption. Video encoding/decoding is one of these applications, and it needs specific low-power design on algorithms and architectures to reduce the power consumption. A video encoding system consists of different coding components, which include motion estimation (ME), discrete cosine transform (DCT), inverse discrete cosine transform (IDCT), entropy coding, and others depending on the standard. They have different characteristics on computation, and different algorithms and architectures are needed to achieve low-power requirement and maintain the performance at the same time. MPEG-4 is a video compression standard established since 1999. It has been widely adopted for video compression until now. On mobile devices,MPEG-4 simple profile is the popular standard because of its simplicity and good coding performance. It contains basic but useful encoding components, like ME, DCT, IDCT, AC/DC prediction, and variable length coding. We analyze some key components of MPEG-4 SP encoder, like ME, DCT, and IDCT, and develop suitable low-power algorithms and architectures for them. After optimizing each modules, we integrate them and propose a low-power MPEG-4 SP encoder. Power consumption of ME is reduced by fast algorithm and two dimensional bandwidth sharing architecture. power consumption of DCT and IDCT is reduced by content awareness. These algorithm can achieve much power reduction and maintain tolerable coding performance. In circuit level, fine-grained leaf-based gated-clock technique is widely applied on most registers in this design. A 2-D data sharing architecture is proposed for ME design. To reduce computation complexity, moving windows search with modified predictor scheme is adopted. It can achieve computation reduction and degrade less than 0.05dB comparing with full search. The final bandwidth requirement can be greatly reduced to 0.65% comparing with full search without data sharing. AdaptiveDCT is proposed for content-aware computation. It combines many low-power technique and solve the precision problem by coefficient scaling, hybrid architecture, and proposed content classification algorithm. The high probability of zero occurrence is exploited in IDCT and data transfer between quantization (Q) and variable length coding (VLC). Our IDCT adopts previous design proposed by Xanthopoulos [1] with coefficient scaling. It can achieve low-power characteristic in zero computation. Zero marker scheme is proposed to avoid zero-valued data transfer. Data recording of zero-valued data is implemented by registers. Therefore, memory read/write operation of zero-valued data can be avoided. It can reduce 60% to 80% memory access between Q and VLC. Finally, the encoder chip is fabricated under TSMC 0.18 mm CMOS 1P6M process. It contains 201K logic gate counts and 4.56 KB SRAM. It supports CIF 30fps encoding with acceptable performance and supports VGA 30fps as extended resolution. The post-layout gate-level power consumption estimated by the Synopsys Prime Power are 5.9 mW in I-VOP endoing and 9.7 mW in P-VOP encoding at 1.8 V in CIF 30fps encoding. The real power estimation of this chip is 2.5 mW in I-VOP encoding and 5 mW in P-VOP encoding at 1.3 V in CIF 30fps encoding. It has much power reduction from previous works. H.264 is the newest video compression standard developed by the Joint Video Team (JVT). It can reduce 39%, 49%, and 64% of bit-rate comparing with MPEG-4, H.264, and MPEG-2. Its excellent coding performance make it be widely adopted by commercial applications including digital TV broadcasting, next-generation DVD, and network streaming. The excellent coding performance makes H.264 suitable for high resolution video compression, but it also brings in huge computation overhead and consumes lots of hardware resources and power. To solve this problem, we focus on integer motion estimation (IME) part of H.264 encoder. It occupies most part of computation especially at high resolution, like high definition DV (HDTV). we propose a hierarchical-based ME algorithm which can reduce computation complexity to 0.45% from full search and improve the coding performance. Corresponding architecture can processing block matching at three different levels and support good data sharing scheme at each of them. These makes it suitable for low-power H.264 encoder design for high resolution applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call