Clarifying the Dialogue-Level Performance of GPT-3.5 and GPT-4 in Task-Oriented and Non-Task-Oriented Dialogue Systems

Shinya Iizuka,Ryuichiro Higashinaka,Ao Guo,Shota Mochizuki,Sanae Yamashita,Atsumoto Ohashi

doi:10.1609/aaaiss.v2i1.27668

Clarifying the Dialogue-Level Performance of GPT-3.5 and GPT-4 in Task-Oriented and Non-Task-Oriented Dialogue Systems

Shinya Iizuka, Ryuichiro Higashinaka + Show 4 more

Open Access

https://doi.org/10.1609/aaaiss.v2i1.27668

Copy DOI

Journal: Proceedings of the AAAI Symposium Series	Publication Date: Jan 22, 2024
Citations: 1

#Large Language Models #Dialogue Systems + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Although large language models such as ChatGPT and GPT-4 have achieved superb performances in various natural language processing tasks, their dialogue performance is sometimes not very clear because the evaluation is often done on the utterance level where the quality of an utterance given context is the evaluation target. Our objective in this work is to conduct human evaluations of GPT-3.5 and GPT-4 to perform MultiWOZ and persona-based chat tasks in order to verify their dialogue-level performance in task-oriented and non-task-oriented dialogue systems. Our findings show that GPT-4 performs comparably with a carefully created rule-based system and has a significantly superior performance to other systems, including those based on GPT-3.5, in persona-based chat.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Proceedings of the AAAI Symposium Series

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.