Abstract
The lack of labeled exemplars makes video classification based on supervised neural networks difficult and challenging. Utilizing external memory that contains task-related knowledge is a beneficial way to learn a category from a handful of samples; however, most existing memory-augmented neural networks still struggle to provide a satisfactory solution for multi-modal external data due to the high dimensionality and massive volume. In light of this, we propose a Memory Transformation Network (MTN) to convert external knowledge, by involving embedded and concentrated memories, so as to leverage it feasibly for video classification with weak supervision. Specifically, we employ a multi-modal deep autoencoder to project external visual and textual information onto a shared space to produce joint embedded memory, which can capture the correlation amongst different modalities to enhance the expressive ability. The curse of dimensionality issue can also be alleviated owing to the inherent dimension reduction ability of the autoencoder. Besides, an attention-based compression mechanism is employed to generate concentrated memory, which records useful information related to a specific task. In this way, the obtained concentrated memory is relatively lightweight to mitigate the time-consuming content-based addressing on large-volume memory. Our model outperforms the state-of-the-arts by 5.44% and 1.81% on average in two metrics over three real-world video datasets, demonstrating its effectiveness and superiority on visual classification with limited labeled exemplars.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.