The article presents the results of the analysis and evaluation of the capabilities of the open version of the chatbot with artificial intelligence ChatGPT 3.5 for solving typical problems of software reverse engineering. Three classes of reverse engineering tasks were selected for analysis: source code analysis, binary code analysis, and data models analysis. In each class of tasks, the most typical tasks were selected taking into account the limitations of ChatGPT regarding the processing of graphical models and the amount of input and output data, and sets of test tasks were developed for each task. As an approach to assessment, an approach similar to the assessment of competencies of higher education applicants after studying the relevant discipline was chosen. The following were considered as criteria for evaluating answers to test tasks: correctness (rightness, coincidence with expectations); completeness (obtaining the final result); accuracy (solving the task without additional questions); reasonableness (availability of explanations and answers to questions). The following scale was used and described for evaluations: excellent, very good, good, satisfactory, sufficient, unsatisfactory. During the testing for each test task, tasks statement and all necessary data were entered through the ChatGPT interface. As a result of the analysis, it was determined that ChatGPT better solves the problems of source code analysis (excellent and very good grades were obtained for semantic and structural analysis, restoration of mathematical support, quality assessment, security audit and refactoring, as well as for conversion to another programming language), tasks of decompilation of IDA pseudocode into complete C source code, tasks of reverse engineering of relational databases, and tasks of generating YARA rules for recognizing file formats. Unsatisfactory grades were obtained for dynamic analysis of assembly code and determination of binary file structures of non-standard formats. ChatGPT solves the rest of the problems well and satisfactorily, but requires checking the results, clarifying queries and prompts, as well as manual error correction in some cases. ChatGPT errors were observed when analyzing binary data represented by hexadecimal symbols, as well as errors in generated scripts for programming in IDA. On the basis of the set grades, conclusions were made regarding the expediency, possibility, or impracticality of using ChatGPT to solve each type of software reverse engineering problems, and appropriate recommendations were also provided. Prospects for further research include testing new versions of ChatGPT and other similar artificial intelligence systems regarding the capabilities of analyzing and synthesizing graphical models of software.
Read full abstract