Abstract
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise.
Highlights
During the past few years, we have witnessed an unprecedented change in the way multimedia data are generated and consumed as well as the wide adaptation of image/video in an increasing number of driving applications
The same conclusions apply to machine learning-based video coding methods as those cited in Section 3.1, where parts of the video compression pipeline are replaced by machine learning-enabled ones, as well as those where the entire pipeline has been replaced by a machine learning-based end-to-end video codec
We provide an overview of the main machine learning-based video streaming systems used both in wireless and wired streaming and clearly show the advances that machine learning has unlocked within this content
Summary
During the past few years, we have witnessed an unprecedented change in the way multimedia data are generated and consumed as well as the wide adaptation of image/video in an increasing number of driving applications. Machine learning-based multimedia communication and coding systems achieve significant gains in comparison to conventional systems, the main sources of suboptimality are that: (i) these research areas have been studied and developed in a fragmented way; (ii) coding and communication frameworks remain human-centric while an increasing number of applications are machine-centric; (iii) interactivity is not fully taken into account when designing an end-to-end multimedia communication system; (iv) machine learning-based image/video coding systems still rely on entropy coding, which compromises error-resilient properties and complicates the transport protocols; (v) the multimedia delivery systems have been optimized considering the structure of the bitstream generated by conventional image/video coders and, perform suboptimally when used to transport bitstreams generated by machine learning-based image/video coding systems; and, (vi) the latest use cases, for example, ITS, AR/VR/XR, and so forth, have ultra-low latency requirements that cannot be met by optimizing a part of the ecosystem separately.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.