Edge Workloads Monitoring and Failover: a StarlingX-Based Testbed Implementation and Measurement Study

Mohammed Abuibaid,Aidan Seguin-Mcpeake,Thomas Yungblut,Amir Hossein Ghorab,Owen Yuen,Marc St-Hilaire

doi:10.1109/access.2022.3204976

Mohammed Abuibaid, Aidan Seguin-Mcpeake + Show 4 more

Open Access

https://doi.org/10.1109/access.2022.3204976

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 2	License type: CC BY 4.0

Affiliation: Carleton University

Abstract

With the ever-growing amount of time-critical, compute-intensive, and private IoT applications, the need for High Availability (HA) Edge Clouds becomes indispensable. Realizing HA Edge Clouds is inherently challenging due to the geographically-dispersed hierarchy of the Distributed Cloud Infrastructure (DCI). For example, frequent isolation between the central Cloud and Edge Clouds due to networking instability necessitates some autonomous operations at the Edge Clouds. Furthermore, because Edge Clouds have fewer resources than central Clouds, configuring the Edge functions (i.e., control, compute, and storage) in HA clusters will undoubtedly reduce downtime. However, it will limit the Edge scalability. To that end, StarlingX is developing an HA-protected and scalable DCI virtualization platform based on the open-source ecosystem, focusing on low-touch management of Edge Clouds. StarlingX provides a fault management service that realizes DCI-wide alarming and logging capabilities, allowing for rapid response to virtualized infrastructure events. Recently, the IETF Network Working Group proposed that monitoring both the DCI and the Edge workloads (software containers) is critical for an Edge Computing Platform to maintain HA IoT application deployment. Indeed, the possibility of the infrastructure remaining stable and healthy while the workloads suffer a fatal failure simultaneously necessitates failover functionality that monitors both the infrastructure and the Edge workloads. In this paper, we first propose a dynamic failover functionality that centrally monitors Edge workloads to recover from deployment or Edge node failures, motivated by the IETF direction. Second, we experimentally optimize the failover functionality for monitoring a microservice-architected IoT application deployed on a StarlingX-based DCI testbed to collect temperature sensor readings from Raspberry Pis. Regardless of how quickly the Edge workload health checks are collected, the recorded failover measurements reveal that the recovery time will not drop below a predetermined level controlled by Edge resources and network speed. Furthermore, reducing the statistics collection timeout reduces the recovery time of an Edge node failure. When the timeout value is less than the minimum achievable recovery time, false-positive failures (FPFs) can occur. Third, to supplement the StarlingX fault management service, we provide a modular implementation of the proposed failover functionality. Finally, we present the first-ever introduction of the StarlingX platform’s software stack to promote its use in academic research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Edge Workloads Monitoring and Failover: a StarlingX-Based Testbed Implementation and Measurement Study

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Future edge clouds
Volker Hilt ... Kevin Sparks
Bell Labs Technical Journal | VOL. 24
Volker Hilt, et. al.Volker Hilt ... Kevin Sparks
01 Dec 2019
Bell Labs Technical Journal | VOL. 24

Delay Constrained Hybrid CRAN: A Functional Split Optimization Framework
Abdulrahman Alabbasi ... Miguel Berg
-
Abdulrahman Alabbasi, et. al.Abdulrahman Alabbasi ... Miguel Berg
01 Dec 2018
01 Dec 2018

Innovative soft computing-enabled cloud optimization for next-generation IoT in digital twins
Hailin Feng ... Zhihan Lv
Applied Soft Computing | VOL. 136
Hailin Feng, et. al.Hailin Feng ... Zhihan Lv
10 Feb 2023
Applied Soft Computing | VOL. 136

Topology-Aware Resource-Efficient Placement for High Availability Clusters Over Geo-Distributed Cloud Infrastructure
Truong-Xuan Do ... Younghan Kim
IEEE Access | VOL. 7
Truong-Xuan Do, et. al.Truong-Xuan Do ... Younghan Kim
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Edge Workloads Monitoring and Failover: a StarlingX-Based Testbed Implementation and Measurement Study

Abstract

Talk to us

Similar Papers

More From: IEEE Access