Abstract
AbstractDeploying huge deep learning applications on resource-constrained edge devices is a challenging task. Cloud-based edge computing is a promising solution. Such as model partitioning, a portion of the deep learning model is deployed on the edge device; while, the remaining portion is executed by the cloud. Leveraging the computation power of edge devices, transmission latency is reduced, and bandwidth efficiency is increased. Recently, visual transformer models, supported by large datasets, have dominated in multiple vision tasks. However, model partitioning optimization methods for visual transformers are lacking. Therefore, the paper proposes a cosine similarity-based token subsampling method for visual transformer model partitioning to improve transmission efficiency. Tokens in the same class are subsampled and only the centroid tokens are uploaded. In the cloud, all tokens are reconstructed based on interpolation indexes. Three algorithm implementations are proposed and measured on PC, Jetson NANO and edge CPU Cortex-A53. The experimental results demonstrate that the recommended algorithm implementation can be executed with low-latency of 71.24 ms, and 35.65% transmitted data is reduced with an accuracy drop of 0.46%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.