Video compression is a complex task, with an almost unlimited number of potential encoding decision permutations. While software-based encoders can, in theory, exhaustively explore all the possible encoding options in a bid to maximize compression efficiency, such an approach means there is the potential to consume as much central processing unit (CPU) power as is available. A direct consequence is a poor tradeoff between the processing required and the reduction in bitrate achieved, especially since technical and commercial limitations often place constraints on the amount of processing for any given application. A long-standing aim for encoders is, therefore, to optimize the use of the available computing resources, either by driving down the bitrates (and therefore delivery costs) or reducing the CPU cost to achieve the required quality. A common approach to reduce CPU costs is to take shortcuts and make decisions upfront based on less information, in an attempt to guess or shortlist decisions without exhaustively evaluating all options. However, the diverse and extensive number of permutations of options available in modern codecs make these decisions very complex, meaning that this approach normally leads to a less bitrate-efficient compression. If means can be found to utilize the CPU more effectively, then the objective of improved encoder performance, measured in either bitrate reduction, infrastructure cost, or a combination of both, can be achieved. This is the application space considered in this article: how to use artificial intelligence (AI) and machine learning (ML) to efficiently perform encoding decisions in realtime based on the characteristics of the incoming content, allowing the use of the CPU power available more effectively (or equivalently, reducing the need for CPU power). Considering the AI processing itself also requires CPU power, such an application requires carefully tuned approaches to guarantee that the combination of the AI processing plus its dynamically adapted encode algorithm achieves a better bitrate versus CPU for the same quality than can be achieved using standard algorithms.
Read full abstract