Despite offering early promise, Deep Reinforcement Learning (DRL) suffers from several challenges in adaptive bitrate streaming stemming from the uncertainty and noise in network conditions. However, in this paper, we find that although these challenges complicate the training process, in practice, we can substantially mitigate their effects by addressing a key overlooked factor: the skewed input trace distribution in DRL training datasets. We introduce a generalized framework, Plume , to automatically identify and balance the skew using a three-stage process. First, we identify the critical features that determine the behavior of the traces. Second, we classify the traces into clusters. Finally, we prioritize the salient clusters to improve the overall performance of the controller. We implement our ideas with a novel ABR controller, Gelato , and evaluate the performance against state-of-the-art controllers in the real world for more than a year, streaming 59 stream-years of television to over 280,000 users on the live streaming platform Puffer. Gelato trained with Plume outperforms all baseline solutions and becomes the first controller on the platform to deliver statistically significant improvements in both video quality and stalling, decreasing stalls by as much as 75%.
Read full abstract