Evaluating the performance of ChatGPT and Perplexity AI in Business Reference

Michael Deike

doi:10.1080/08963568.2024.2317534

Abstract

The Thomas Mahaffey Jr. Business Library conducted a study to assess the performance of two competing generative AI products, ChatGPT and Perplexity AI, in answering business reference questions. The study used a data set consisting of a sample of anonymized reference questions submitted through the library’s ServiceNow ticketing system between January 2018 and May 2022. The questions were input as prompts to each competing AI. Responses were collected and evaluated by their performance in four separate dimensions relevant to business reference: accessibility, library referral, quality, and serendipity. Each dimension was scored on a 0-5 Likert scale resulting in a final composite performance score for each AI. Results showed similar and underwhelming performance between each AI at the composite level. Analysis of scores in each individual scoring dimension showed greater variance in the score distributions between the competing AI. Through the evaluation process, key strengths, weaknesses, and trends emerged between each AI. The study provides a quantitative measure of where generative AI stands in its capabilities in a business library reference context, and it recommends, based on the results of the evaluation, making use of generative AI in its current iteration as a supplementary tool for business reference as opposed to considering it as a replacement.

Full Text