Enabling Serverless Deployment of Large-Scale AI Workloads

Angelos Christidis,Sotiris Moschoyiannis,Ching-Hsien Hsu,Roy Davies

doi:10.1109/access.2020.2985282

Abstract

We propose a set of optimization techniques for transforming a generic AI codebase so that it can be successfully deployed to a restricted serverless environment, without compromising capability or performance. These involve (1) slimming the libraries and frameworks (e.g., pytorch) used, down to pieces pertaining to the solution; (2) dynamically loading pre-trained AI/ML models into local temporary storage, during serverless function invocation; (3) using separate frameworks for training and inference, with ONNX model formatting; and, (4) performance-oriented tuning for data storage and lookup. The techniques are illustrated via worked examples that have been deployed live on geospatial data from the transportation domain. This draws upon a real-world case study in intelligent transportation looking at on-demand, realtime predictions of flows of train movements across the UK rail network. Evaluation of the proposed techniques shows the response time, for varying volumes of queries involving prediction, to remain almost constant (at 50 ms), even as the database scales up to the 250M entries. The query response time is important in this context as the target is predicting train delays. It is even more important in a serverless environment due to the stringent constraints on serverless functions’ runtime before timeout. The similarities of a serverless environment to other resource constrained environments (e.g., IoT, telecoms) means the techniques can be applied to a range of use cases.

Highlights

Standard architectures for deploying AI workloads currently mirror typical client-server architectures with the AI models and data sitting on the server side and requests coming from the client side
We used example use-cases from the Real Time Flow (RTF) project to illustrate the key ideas behind the techniques
WORK We have presented and detailed a set of serverless code optimization techniques that can be used to transform production AI workloads on big data so that they can be deployed in a serverless architecture

Summary

Introduction

Standard architectures for deploying AI workloads currently mirror typical client-server architectures with the AI models and data sitting on the server side and requests coming from the client side. D. (AWS ECOSYSTEM SPECIFIC)-IMPROVING DATA LOOKUP SPEEDS FOR DEALING WITH MAXIMUM FUNCTION LIFETIME RESTRICTIONS Another optimization which we considered for our use case on the RTF Project is to work with the data itself which is used to serve predictions.

Results

Conclusion