The intensive computation of High Efficiency Video Coding (HEVC) engenders challenges for the hardwired encoder in terms of the hardware overhead and the power dissipation. On the other hand, the constrains in hardwired encoder design seriously degrade the efficiency of software oriented fast coding unit (CU) partition mode decision algorithms. A fast algorithm is attributed as VLSI friendly, when it possesses the following properties. First, the maximum complexity of encoding a coding tree unit (CTU) could be reduced. Second, the parallelism of the hardwired encoder should not be deteriorated. Third, the process engine of the fast algorithm must be of low hardware- and power-overhead. In this paper, we devise the convolution neural network based fast algorithm to decrease no less than two CU partition modes in each CTU for full rate-distortion optimization (RDO) processing, thereby reducing the encoder's hardware complexity. As our algorithm does not depend on the correlations among CU depths or spatially nearby CUs, it is friendly to the parallel processing and does not deteriorate the rhythm of RDO pipelining. Experiments illustrated that, an averaged 61.1% intraencoding time was saved, whereas the Bjøntegaard-Delta bit-rate augment is 2.67%. Capitalizing on the optimal arithmetic representation, we developed the high-speed [714 MHz in the worst conditions (125 °C, 0.9 V)] and low-cost (42.5k gate) accelerator for our fast algorithm by using TSMC 65-nm CMOS technology. One accelerator could support HD1080p at 55 frames/s real-time encoding. The corresponding power dissipation was 16.2 mW at 714 MHz. Finally, our accelerator is provided with good scalability. Four accelerators fulfill the throughput requirements of UltraHD-4K at 55 frames/s.
Read full abstract