A Design Space Exploration Methodology for Enabling Tensor Train Decomposition in Edge Devices

Milad Kokhazadeh,Vasilios Kelefouras,Georgios Keramidas,Iakovos Stamoulis

doi:10.1007/978-3-031-15074-6_11

Milad Kokhazadeh, Vasilios Kelefouras + Show 2 more

Open Access

PDF Available

https://doi.org/10.1007/978-3-031-15074-6_11

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

AbstractDeep Neural Networks (DNN) have made significant advances in various fields, including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive and as a consequence their deployment on edge devices is a challenging problem. A well-known technique to address this issue is Low-Rank Factorization (LRF), where a weight tensor is approximated with one or more lower-rank tensors, reducing the number of executed instructions and memory footprint. However, finding an efficient solution is a complex and time-consuming process as LRF includes a huge design space and different solutions provide different trade-offs in terms of FLOPs, memory size, and prediction accuracy. In this work a methodology is presented that formulates the LRF problem as a (FLOPs vs. memory vs. prediction accuracy) Design Space Exploration (DSE) problem. Then, the DSE space is drastically pruned by removing inefficient solutions. Our experimental results prove that it is possible to output a limited set of solutions with better accuracy, memory, and FLOPs compared to the original (non-factorized) model. Our methodology has been developed as a standalone, parameterized module integrated into T3F library of TensorFlow 2.X.KeywordsDeep neural networksNetwork compressionLow-rank factorizationTensor trainDesign space exploration

Full Text