Abstract

Abstract This paper presents several optimization algorithms for a High Efficiency Video Coding (HEVC) encoder based on single instruction multiple data (SIMD) operations and data-level parallelism. Based on the analysis of the computational complexity of HEVC encoder, we found that interpolation filter, cost function, and transform take around 68% of the total computation, on average. In this paper, several software optimization techniques, including frame-level interpolation filter and SIMD implementation for those computationally intensive parts, are presented for a fast HEVC encoder. In addition, we propose a slice-level parallelization and its load-balancing algorithm on multi-core platforms from the estimated computational load of each slice during the encoding process. The encoding speed of the proposed parallelized HEVC encoder is accelerated by approximately ten times compared to the HEVC reference model (HM) software, with minimal loss of coding efficiency.

Highlights

  • Along with the development of multimedia and hardware technologies, the demand for high-resolution video services with better quality has been increasing

  • 6 Experimental results we show the performance of the proposed optimization techniques for High Efficiency Video Coding (HEVC) encoder in terms of Bjontegaard distortion-bitrate (BD-BR) [28], Bjontegaard distortion peak signal-to-noise ratio (BD-PSNR) [28], and average time saving (ATS)

  • A PC equipped with an Intel® CoreTM i7-3930 K CPU and 16 GB memory was used for this evaluation

Read more

Summary

Introduction

Along with the development of multimedia and hardware technologies, the demand for high-resolution video services with better quality has been increasing. Proposed slice-level parallelism with load balance To reduce the computational load of the RD optimization, early termination and mode competition algorithms have been adopted in HM reference software [23,24,25]. In Zhang's algorithm [26], the adaptive data partitioning for MPEG-2 video encoders was proposed by adjusting computational loads based on the complexity of a previously encoded frame of the same picture type.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call