Abstract

One of the challenges faced by many video providers is the heterogeneity of network specifications, user requirements, and content compression performance. The universal solution of a fixed bitrate ladder is inadequate in ensuring a high quality of user experience without re-buffering or introducing annoying compression artifacts. However, a content-tailored solution, based on extensively encoding across all resolutions and over a wide quality range is highly expensive in terms of computational, financial, and energy costs. Inspired by this, we propose an approach that exploits machine learning to predict a content-optimized bitrate ladder for on-demand video services. The method extracts spatio-temporal features from the uncompressed content, trains machine-learning models to predict the Pareto front parameters and, based on that, builds the ladder within a defined bitrate range. The method has the benefit of significantly reducing the number of encodes required per sequence. The presented results, based on 100 HEVC-encoded sequences, demonstrate a reduction in the number of encodes required when compared to an exhaustive search and an interpolation-based method, by 89.06% and 61.46%, respectively, at the cost of an average Bjøntegaard Delta Rate difference of 1.78% compared to the exhaustive approach. Finally, a hybrid method is introduced that selects either the proposed or the interpolation-based method depending on the sequence features. This results in an overall 83.83% reduction of required encodings at the cost of an average Bjøntegaard Delta Rate difference of 1.26%.

Highlights

  • In recent reports on internet traffic volumes [1], the share occupied by video data is predicted to reach 80% by 2023 with anticipation of further rises subsequently

  • In this paper we have proposed a reduced complexity, contentcustomised, solution that can predict the bitrate ladder for adaptive streaming, based on spatio-temporal features extracted from uncompressed video at its native resolution

  • When compared to the exhaustive search, the results show a mean BD-Rate loss of only 1.78% and a mean BD-Peak Signal-to-Noise Ratio (PSNR) of 0.04 dB, but with a reduction on average of 89.06% in the number of encodings needed

Read more

Summary

Introduction

In recent reports on internet traffic volumes [1], the share occupied by video data is predicted to reach 80% by 2023 with anticipation of further rises subsequently. Many video service providers invest a significant amount of resource into optimizing video compression parameters prior to transmission [3]–[5] This enables them to increase user satisfaction - meeting varying end-user constraints while maintaining the highest possible level of delivered video quality. A given mobile phone is likely to receive a different encoded version of the same source video on a 5G network than it would on a 4G network. These encodes may vary in terms of both compression ratio and spatial resolution. It means that an end-user’s device might receive content compressed at lower resolutions that is upscaled to a device’s native resolution prior to display

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.