Abstract

The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call