A Scalable Service Architecture with Request Queuing for Resource-Intensive Tasks

Kasidis Chaowvasin,Pun Sutanchaiyanonta,Nont Kanungsukkasem,Teerapong Leelanupab

doi:10.1109/ecti-con49241.2020.9158114

Abstract

Deploying Machine Learning (ML) prediction or Data Analytic (DA) process as a service in a Web API is not a trivial task. A number of settings and dependency requirements must be met to provide ML or DA successful solutions. In addition, an application that utilizes such an API needs to be always available to serve multiple users who can concurrently submit their requests. ML modeling or DA processing is a resource-intensive task, which can take a massive amount of time to process. Some tasks may take just a few minutes or hours while others may take several days to complete. In this paper, we design and develop a scalable architecture of API services for hosting ML models or DA functionalities in a production-grade deployment. The technologies of containerization and container orchestration, i.e., Docker and Kubernetes, have been employed to automate the deployment, scaling, and management of containerized ML or DA instances. To meet high-scale and high-availability requirements, the open-source message broker, i.e., RabbitMQ, is also used and containerized in Docker for scheduling multiple requests as task messages. These messages are then put into a task queue so that they will be processed later consecutively. Also, Nginx and Node.js with Express.js have been used and containerized as a web server and an API provider, respectively. We use a case-study of an intelligent system for processing documents about national research granting to validate our architecture.

Full Text