Motion estimation (ME) plays a crucial role in removing the temporal redundancy for video compression. However, during the encoding process a substantial computational burden is imposed by ME due to the exhaustive evaluations of possible candidates within the searching window. In view of the increasing computing capacity of GPU, we propose a GPU-based low delay parallel ME scheme for high efficiency video coding (HEVC). In particular, considering the quadtree coding structure of HEVC, we achieve the parallelization in a hierarchical way by optimizing the ME process in a coding tree unit (CTU), prediction unit (PU), and motion vector (MV) layers. Specifically, in the CTU layer, a novel motion vector predictor determination scheme is proposed to alleviate the side effects of inaccurate MV prediction due to the removal of the CTU-level dependency. In the PU layer, a novel indexing table is particularly designed to realize an efficient cost derivation strategy. As such, the cost of each PU can be computed in a convenient and efficient manner. In an MV layer, we propose a compact descriptor to represent MV and its corresponding cost as a whole, such that the redundant branches can be further avoided in the searching process. With such an optimization strategy, the proposed scheme can completely save the encoding time for ME on CPU. Experimental results demonstrate that the proposed scheme can achieve 41% encoding time savings with the ME acceleration up to 12.7 times, and the incurred BD-BR loss is only 0.52% on average. Moreover, further experimental results show that the proposed GPU-based ME can achieve up to 200 times acceleration compared to the full search ME on CPU.
Read full abstract