Serving Machine Learning Workloads in Resource Constrained Environments: a Serverless Deployment Example

Angelos Christidis,Sotiris Moschoyiannis,Roy Davies

doi:10.1109/soca.2019.00016

Abstract

Deployed AI platforms typically ship with bulky system architectures which present bottlenecks and a high risk of failure. A serverless deployment can mitigate these factors and provide a cost-effective, automatically scalable (up or down) and elastic real-time on-demand AI solution. However, deploying high complexity production workloads into serverless environments is far from trivial, e.g., due to factors such as minimal allowance for physical codebase size, low amount of runtime memory, lack of GPU support and a maximum runtime before termination via timeout. In this paper we propose a set of optimization techniques and show how these transform a codebase which was previously incompatible with a serverless deployment into one that can be successfully deployed in a serverless environment; without compromising capability or performance. The techniques are illustrated via worked examples that have been deployed live on rail data and realtime predictions on train movements on the UK rail network. The similarities of a serverless environment to other resource constrained environments (IoT, Mobile) means the techniques can be applied to a range of use cases.

Full Text