A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning

Shan He,Yuanyao Lu

doi:10.3390/electronics8121417

Abstract

Image captioning is a comprehensive task in computer vision (CV) and natural language processing (NLP). It can complete conversion from image to text, that is, the algorithm automatically generates corresponding descriptive text according to the input image. In this paper, we present an end-to-end model that takes deep convolutional neural network (CNN) as the encoder and recurrent neural network (RNN) as the decoder. In order to get better image captioning extraction, we propose a highly modularized multi-branch CNN, which could increase accuracy while maintaining the number of hyper-parameters unchanged. This strategy provides a simply designed network consists of parallel sub-modules of the same structure. While traditional CNN goes deeper and wider to increase accuracy, our proposed method is more effective with a simple design, which is easier to optimize for practical application. Experiments are conducted on Flickr8k, Flickr30k and MSCOCO entities. Results demonstrate that our method achieves state of the art performances in terms of caption quality.

Highlights

As an important source of information, numerous images are digitally stored and transmitted on the Internet
Image captioning is a comprehensive task in computer vision (CV) [1] and natural language processing (NLP) [2], which can complete multi-modal conversion from image to text
The translation recurrent neural network is a mature technology in NLPmine and the plays an important role in time-series data and use the previous information to assist the current task

Summary

Introduction

As an important source of information, numerous images are digitally stored and transmitted on the Internet. Based on significant advances of encoder–decoder structure in machine translation [18,19], a generative model called Neural Image Caption (NIC) [20] was proposed. Yan et al [26] proposed a hierarchical attention mechanism via using both the global CNN features and the local object features for improved results Despite these advancements, the realization of these refined methods has been accompanied with complicated network structures and the growing number of parameters. It has limited ability to adapt the network architectures to other datasets and tasks To address these above issues, we propose a simple end-to-end image captioning model with extended CNN architecture. We first propose a new multi-branch CNN model based on residual learning for image captioning. The improved network has a large receptive field that is important for learning the complex relationship of object categories

ResNet

Multi-Branch CNN

Result

Evaluation

Datasets

Implementation Details

Results

Method

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Nov 28, 2019
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Swin transformer for hyperspectral rare sub-pixel target detection
Thierry Eude ... Ludovic Girard
-
Thierry Eude, et. al.Thierry Eude ... Ludovic Girard
31 May 2022
31 May 2022

Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks
Aliaksei Severyn ... Alessandro Moschitti
-
Aliaksei Severyn, et. al.Aliaksei Severyn ... Alessandro Moschitti
09 Aug 2015
09 Aug 2015

A Dictionary-Based Convolutional Recurrent Neural Network Model for Sentiment Analysis
Limin Zheng ... Jin Zheng
-
Limin Zheng, et. al.Limin Zheng ... Jin Zheng
01 Jul 2019
01 Jul 2019

A Light-weight Convolutional Neural Network based Speech Recognition for Spoken Content Retrieval Task
Andreas Nurnberger ... Nirayo Hailu Gebreegziabher
-
Andreas Nurnberger, et. al.Andreas Nurnberger ... Nirayo Hailu Gebreegziabher
11 Oct 2020
11 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics