Abstract
Deep Learning (DL) has consistently surpassed other Machine Learning methods and achieved state-of-the-art performance in multiple cases. Several modern applications like financial and recommender systems require models that are constantly updated with fresh data. The prominent approach for keeping a DL model fresh is to trigger full retraining from scratch when enough new data are available. However, retraining large and complex DL models is time-consuming and compute-intensive. This makes full retraining costly, wasteful, and slow. In this paper, we present an approach to continuously train and deploy DL models. First, we enable continuous training through proactive training that combines samples of historical data with new streaming data. Second, we enable continuous deployment through gradient sparsification that allows us to send a small percentage of the model updates per training iteration. Our experimental results with LeNet5 on MNIST and modern DL models on CIFAR-10 show that proactive training keeps models fresh with comparable—if not superior—performance to full retraining at a fraction of the time. Combined with gradient sparsification, sparse proactive training enables very fast updates of a deployed model with arbitrarily large sparsity, reducing communication per iteration up to four orders of magnitude, with minimal—if any—losses in model quality. Sparse training, however, comes at a price; it incurs overhead on the training that depends on the size of the model and increases the training time by factors ranging from 1.25 to 3 in our experiments. Arguably, a small price to pay for successfully enabling the continuous training and deployment of large DL models.
Highlights
Deep Learning (DL) is a subfield of Machine Learning (ML), involving Deep Neural Network (DNN) models, which has shown huge success in recent years
Our experiments find the sparse proactive training method to be suitable for continuous training and deployment
We enable the continuous deployment of very large DL models borrowing ideas from distributed DL training to sparsify weight updates per iteration in order to reduce the deployment cost
Summary
Deep Learning (DL) is a subfield of Machine Learning (ML), involving Deep Neural Network (DNN) models, which has shown huge success in recent years. It has dramatically improved the state-of-the-art in many fields, like speech recognition [1], computer vision [2], and natural lan-. Germany guage understanding [3] This success is explained by the fact that the quality of DL models improves with increasing dataset sizes, due to their ability to learn representations directly from data. DNN results are not interpretable [4] Their optimization is not theoretically well understood, relying on non-convex optimization [5]. DL models can be massive in size; recently, the GPT-3 [3] architecture featured a staggering 175 billion parameters
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have