ChatGPT or Bard: Who is a better Certified Ethical Hacker?

Krishnashree Achuthan,Prasad Calyam,Raghu Raman

doi:10.1016/j.cose.2024.103804

Abstract

In this study, we compare two leading Generative AI (GAI) tools, ChatGPT and Bard, specifically in Cybersecurity, using a robust set of standardized questions from a validated Certified Ethical Hacking (CEH) dataset. In the rapidly evolving domain of Generative AI (GAI) and large language models (LLM), a comparative analysis of tools becomes essential to measure their performance. We determine the Comprehensiveness, Clarity, and Conciseness of the AI-generated responses through a detailed questioning-based framework. The study revealed an overall accuracy rate of 80.8 % for ChatGPT and 82.6 % for Bard, indicating comparable capabilities and specific differences. Bard slightly outperformed ChatGPT in accuracy, while ChatGPT exhibited superiority in Comprehensiveness, Clarity, and Conciseness of responses. Introducing a confirmation query like “Are you sure?” increased accuracy for both generative AI tools, illustrating the potential of iterative query processing in enhancing GAI tools' effectiveness. The readability evaluation placed both tools at a college reading level, with Bard marginally more accessible. While evaluating certain questions, a distinct pattern emerged where Bard provided generic denials of assistance while ChatGPT referenced “ethics.” This discrepancy illustrates the contrasting philosophies of the developers of these tools, with Bard possibly following stricter guidelines, especially in sensitive topics like Cybersecurity. We explore the implications and identify key areas for future research that become increasingly relevant as GAI tools see broader adoption.

Full Text