The growing number of AI-driven applications in mobile devices has led to solutions that integrate deep learning models with the available edge-cloud resources. Due to multiple benefits such as reduction in on-device energy consumption, improved latency, improved network usage, and certain privacy improvements, split learning, where deep learning models are split away from the mobile device and computed in a distributed manner, has become an extensively explored topic. Incorporating compression-aware methods (where learning adapts to compression level of the communicated data) has made split learning even more advantageous. This method could even offer a viable alternative to traditional methods, such as federated learning techniques. In this work, we develop an adaptive compression-aware split learning method (“deprune”) to improve and train deep learning models so that they are much more network-efficient, which would make them ideal to deploy in weaker devices with the help of edge-cloud resources. This method is also extended (“prune”) to very quickly train deep learning models through a transfer learning approach, which tradesoff little accuracy for much more network-efficient inference abilities. We show that the “deprune” method can reduce network usage by 4× when compared with a split-learning approach (that does not use our method) without loss of accuracy, while also improving accuracy over compression-aware split-learning by up to 4 percent. Lastly, we show that the “prune” method can reduce the training time for certain models by up to 6× without affecting the accuracy when compared against a compression-aware split-learning approach.
Read full abstract