Abstract

The advent of Generative Artificial Intelligence is opening essential questions about whether and when AI will replace human abilities in accomplishing everyday tasks. This issue is particularly true in the domain of software development, where generative AI seems to have strong skills in solving coding problems and generating software source code. In this paper, an empirical evaluation of AI-generated source code is performed: three complex coding problems (selected from the exams for the Java Programming course at the University of Insubria) are prompted to three different Large Language Model (LLM) Engines, and the generated code is evaluated in its correctness and quality by means of human-implemented test suites and quality metrics. The experimentation shows that the three evaluated LLM engines are able to solve the three exams but with the constant supervision of software experts in performing these tasks. Currently, LLM engines need human-expert support to produce running code that is of good quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.