ABSTRACTDeploying end‐to‐end ML applications on edge resources becomes a viable solution to achieve performance and data regulations. With the microservice architecture, these applications can scale dynamically, improving service availability under dynamic workloads. However, orchestrating multiple end‐to‐end ML applications within heterogeneous edge environments must deal with numerous challenges while sharing computing resources. Prevalent orchestration tools/frameworks supporting edge ML serving are inefficient in provisioning methods due to constrained resources, diverse resource demands and utilization patterns. In this work, we present a provisioning method to optimize resource utilization for end‐to‐end ML applications on a heterogeneous edge. By profiling all microservices within the application, we estimate scales and allocate them on desired hardware platforms with sufficient resources when considering their runtime utilization patterns. We also provide several practical analyses on runtime monitoring metrics to detect and mitigate resource contentions, guaranteeing performance. The experiments with three real‐world ML applications demonstrate the practicality of our method on a heterogeneous edge cluster of Raspberry Pis and Jetson Developer Kits.
Read full abstract