BAIPAS: Distributed Deep Learning Platform with Data Locality and Shuffling

Mikyoung Lee,Seungkyun Hong,Sungho Shin,Sa-Kwang Song

doi:10.1109/eecs.2017.10

Abstract

In this paper, we introduce a distributed deep learning platform, BAIPAS, Big Data and AI based Predication and Analysis System. In the case of deep learning using big data, it takes much time to train with data. To reduce training time, there is a method that uses distributed deep learning. When big data exists in external storage, training takes a long time because it takes a lot of network I/O time when data is loaded during deep learning operations. We propose data locality management as a way to reduce training time with big data. BAIPAS is a distributed deep learning platform that aims to provide quick learning from big data, easy installation and monitoring of the platform, and convenience for developers of deep learning models. In order to provide fast training using big data, data is distributed and stored in worker-server storage using a data locality and shuffling, and then training is performed. The data locality manager analyzes the training data and the state information of the worker servers. This distributes the data scheduling according to the available storage space of the worker server and the learning performance of the worker server. However, if each worker server conducts deep learning using the distributed training data, model overfitting may occur as compared with the method of learning with the full training data set. To solve this problem, we applied a shuffling method that moves already learned data to another worker server when training is performed. Thereby, each worker server can contain the full training data set. BAIPAS uses Kubernetes and Docker to provide easy installation and monitoring of the platform. It also provides pre-processing modules, management tools, automation of cluster creation, resource monitoring, and other resources; so developers can easily develop deep learning models.

Full Text