KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal,Subhadarshi Panda,Rituraj Singh,Suraj Modi,Godawari Sudhakar Rao

doi:10.1609/aaai.v38i17.29844

Abstract

Large Language Models (LLMs) have demonstrated impressive performance in natural language processing tasks by leveraging chain of thought (CoT) that enables step-by-step thinking. Extending LLMs with multimodal capabilities is the recent interest, but incurs computational cost and requires substantial hardware resources. To address these challenges, we propose KAM-CoT a framework that integrates CoT reasoning, Knowledge Graphs (KGs), and multiple modalities for a comprehensive understanding of multimodal tasks. KAM-CoT adopts a two-stage training process with KG grounding to generate effective rationales and answers. By incorporating external knowledge from KGs during reasoning, the model gains a deeper contextual understanding reducing hallucinations and enhancing the quality of answers. This knowledge-augmented CoT reasoning empowers the model to handle questions requiring external context, providing more informed answers. Experimental findings show KAM-CoT outperforms the state-of-the-art methods. On the ScienceQA dataset, we achieve an average accuracy of 93.87%, surpassing GPT-3.5 (75.17%) by 18% and GPT-4 (83.99%) by 10%. Remarkably, KAM-CoT achieves these results with only 280M trainable parameters at a time, demonstrating its cost-efficiency and effectiveness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 1

Similar Papers

Utilizing Large Language Models for Geoscience Literature Information Extraction
Peng Yu ... Cheng Deng
-
Peng Yu, et. al.Peng Yu ... Cheng Deng
09 Mar 2024
09 Mar 2024

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang ... Ruixiang Tang
ACM Transactions on Knowledge Discovery from Data | VOL. 18
Jingfeng Yang, et. al.Jingfeng Yang ... Ruixiang Tang
26 Apr 2024
ACM Transactions on Knowledge Discovery from Data | VOL. 18

A Bibliometric Review of Large Language Models Research from 2017 to 2023
Lizhou Fan ... Sanggyu Lee
ACM Transactions on Intelligent Systems and Technology | VOL. 15
Lizhou Fan, et. al.Lizhou Fan ... Sanggyu Lee
21 Oct 2024
A Bibliometric Review of Large Language Models Research from 2017 to 2023
Lizhou Fan ... Sanggyu Lee

Use of SNOMED CT in Large Language Models: Scoping Review.
Eunsuk Chang ... Sumi Sung
JMIR medical informatics | VOL. 12
Eunsuk Chang, et. al.Eunsuk Chang ... Sumi Sung
07 Oct 2024
JMIR medical informatics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence