Mobile edge computing (MEC) provides extremely low-latency services for mobile users, by attaching computing resources to 5G base stations in an MEC network. Network service providers can cache their services from remote data centers to base stations to serve mobile users within their proximity, thereby reducing service latencies. However, mobile users of network services usually have bursty requests that require immediate processing in the MEC network. The data traffic of such requests is bursty, since mobile users have various hidden features including locations, group tags, and mobility patterns. Furthermore, such bursty data traffic causes uncertain congestion at base stations and thus leads to uncertain processing delays. Considering the limited resources of base stations, network services may not be able to be placed into base stations permanently to handle the bursty data traffic. As such, network services need to be dynamically cached in the MEC network to fully address the bursty data traffic and uncertain processing delays.In this paper, we investigate the problem of dynamic service caching and task offloading in an MEC network, by adopting the online learning technique to harness the challenges brought by bursty data traffic and uncertain processing delays. We first propose an online learning algorithm for the problem with uncertain processing delays, by utilizing the multi-armed bandits technique, and analyze the regret bound of the proposed algorithm. We then propose another online learning algorithm for the problem with bursty data traffic and uncertain processing delays, which adaptively learns the bursty data traffic of requests, based on small samples of mobile users’ hidden features. We also propose a novel architecture of Generative Adversarial Network (GAN) to accurately predict user demands using small samples of mobile users’ hidden features. Based on the proposed GAN model, we then devise an efficient heuristic for the problem with the uncertainties of both bursty data traffic and uncertain delays. We finally evaluate the performance of the proposed algorithms by simulations, using a real data trace. Experimental results show that the performance of the proposed algorithms outperforms existing ones by up to 44% in terms of average delay.