DIUSum: Dynamic Image Utilization for Multimodal Summarization

Min Xiao,Feifei Zhai,Chengqing Zong,Junnan Zhu,Yu Zhou

doi:10.1609/aaai.v38i17.29899

Abstract

Existing multimodal summarization approaches focus on fusing image features in the encoding process, ignoring the individualized needs for images when generating different summaries. However, whether intuitively or empirically, not all images can improve summary quality. Therefore, we propose a novel Dynamic Image Utilization framework for multimodal Summarization (DIUSum) to select and utilize valuable images for summarization. First, to predict whether an image helps produce a high-quality summary, we propose an image selector to score the usefulness of each image. Second, to dynamically utilize the multimodal information, we incorporate the hard and soft guidance from the image selector. Under the guidance, the image information is plugged into the decoder to generate a summary. Experimental results have shown that DIUSum outperforms multiple strong baselines and achieves SOTA on two public multimodal summarization datasets. Further analysis demonstrates that the image selector can reflect the improved level of summary quality brought by the images.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DIUSum: Dynamic Image Utilization for Multimodal Summarization

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 1

Similar Papers

Graph-based Multimodal Ranking Models for Multimodal Summarization
Junnan Zhu ... Yu Zhou
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20
Junnan Zhu, et. al.Junnan Zhu ... Yu Zhou
26 May 2021
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20

CtnR: Compress-then-Reconstruct Approach for Multimodal Abstractive Summarization
Chenxi Zhang ... Jiangfeng Li
-
Chenxi Zhang, et. al.Chenxi Zhang ... Jiangfeng Li
18 Jul 2021
18 Jul 2021

Multi-modal anchor adaptation learning for multi-modal summarization
Zhongfeng Chen ... Fan Xu
Neurocomputing | VOL. 570
Zhongfeng Chen, et. al.Zhongfeng Chen ... Fan Xu
15 Dec 2023
Neurocomputing | VOL. 570

Causal Video Summarizer for Video Exploration
Jia-Hong Huang ... Pin-Yu Chen
-
Jia-Hong Huang, et. al.Jia-Hong Huang ... Pin-Yu Chen
18 Jul 2022
18 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DIUSum: Dynamic Image Utilization for Multimodal Summarization

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence