Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency

Sakib Shahriar,Muhammad Arbab Arshad,Nishith Reddy Mannuru,Brady D Lund,Kadhim Hayawi,Aashrith Mannuru,Ravi Varma Kumar Bevara,Laiba Batool

doi:10.3390/app14177782

Abstract

As large language models (LLMs) continue to advance, evaluating their comprehensive capabilities becomes significant for their application in various fields. This research study comprehensively evaluates the language, vision, speech, and multimodal capabilities of GPT-4o. The study employs standardized exam questions, reasoning tasks, and translation assessments to assess the model’s language capability. Additionally, GPT-4o’s vision and speech capabilities are tested through image classification and object-recognition tasks, as well as accent classification. The multimodal evaluation assesses the model’s performance in integrating visual and linguistic data. Our findings reveal that GPT-4o demonstrates high accuracy and efficiency across multiple domains in language and reasoning capabilities, excelling in tasks that require few-shot learning. GPT-4o also provides notable improvements in multimodal tasks compared to its predecessors. However, the model shows variability and faces limitations in handling complex and ambiguous inputs, particularly in audio and vision capabilities. This paper highlights the need for more comprehensive benchmarks and robust evaluation frameworks, encompassing qualitative assessments involving human judgment, as well as error analysis. Future work should focus on expanding datasets, investigating prompt-based assessment, and enhancing few-shot learning techniques to test the model’s practical applicability and performance in real-world scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Sep 3, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Improving the use of LLMs in radiology through prompt engineering: from precision prompts to zero-shot learning.
Fabian Bamberg ... Maximilian Frederik Russe
RöFo - Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren | VOL. -
Fabian Bamberg, et. al.Fabian Bamberg ... Maximilian Frederik Russe
26 Feb 2024
RöFo - Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren | VOL. -

CancerGPT for few shot drug pair synergy prediction using large pretrained language models
Tianhao Li ... Yejin Kim
npj Digital Medicine | VOL. 7
Tianhao Li, et. al.Tianhao Li ... Yejin Kim
19 Feb 2024
npj Digital Medicine | VOL. 7

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu ... Zhuowen Tu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Wenbo Hu, et. al.Wenbo Hu ... Zhuowen Tu
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency

Abstract

Talk to us

Similar Papers

More From: Applied Sciences