Next-generation core networks are expected to achieve service-oriented traffic management for diversified Quality-of-Service (QoS) provisioning based on software-defined networking (SDN) and network function virtualization (NFV). In this article, a learning-based transmission protocol customized for Video-on-Demand (VoD) streaming services is proposed for a Cybertwin-enabled next-generation core network, which provides caching-based congestion control and throughput enhancement functionalities at the edge of the core network based on traffic prediction. The per-slot traffic load of a VoD streaming service at an ingress edge node is predicted based on the autoregressive integrated moving average (ARIMA) model. To balance the tradeoff between network congestion and throughput enhancement, a multiarmed bandit (MAB) problem is formulated to maximize the expected overall network performance in a long run, by capturing the relationship between transmission control actions and QoS provisioning. A comprehensive transmission protocol operation framework is also presented with in-network congestion control and throughput enhancement modules. Simulation results are presented to validate the efficacy of the proposed protocol in terms of packet delay, goodput ratio, throughput, and resource utilization.