Abstract

Network Function Virtualization (NFV) migrates the carrier-grade LTE Evolved Packet Core (EPC) that runs on commodity boxes to the public cloud. In the new virtualized environment, LTE EPC must offer high availability to its mobile users upon failures. Achieving high service availability is challenging because failover procedure must keep the latency-sensitive control-plane procedures intact during failures. Through our empirical study, we show that existing recovery mechanisms on the cloud and standardized LTE solutions are coarse-grained, thus unable to quickly recover from failures. They incur LTE service outage, lost network connectivity, and slow recovery. To address these issues, we describe a new design for fault-tolerant LTE EPC. It provides quick failure detection and timely recovery from failed operations. To reduce failure detection time, it leverages frequent retransmission of LTE control-plane signaling within EPC as an indication of failure. To recover from failure, it adopts a checkpointing based rollback recovery approach in the LTE context and addresses the shortcomings known in the classic checkpointing approach. Our design is LTE standard-compliant and works as a plug-and-play without modifying existing LTE implementations. Our results show that this approach can recover from the failure in 2.6 seconds and only incurs tens of milliseconds of overhead.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call