Abstract
This study aims to address the gap in understanding the extent to which AI can replace human technical interviewers in the recruitment process. It investigates the potential of a Large Language Models, specifically ChatGPT, Google Gemini and Mistral, in assessing candidates’ competencies in Information Technology (IT) compared to evaluations made by human experts. The experiment involved three experienced DevOps specialists who assessed the written responses of 21 candidates to ten industry-relevant questions; each limited to 500 characters. The evaluation was conducted using a simple yet effective −2 to 2 scale, with −2 indicating a negative assessment for incorrect answers, 0 for ambiguous or incomplete answers, and 2 for excellent responses. The same set of responses was then evaluated by LLMs, adhering to the identical criteria and scale. This comparative analysis aims to determine the reliability and accuracy of AI in replicating expert human judgement in IT recruitment. The study’s findings, backed by the Fleiss kappa test, show that human reviewers are not perfectly aligned in their judgement. On the other hand, the AI tool also lacks consistency, as the consequent repetition of the same review request may result in a different decision. The results are anticipated to contribute to the ongoing discourse on AI-assisted decision-making and its practical applications in human resource management and recruitment.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.