Reverse Engineering Tasks Research Articles

The article presents the results of the analysis and evaluation of the capabilities of the open version of the chatbot with artificial intelligence ChatGPT 3.5 for solving typical problems of software reverse engineering. Three classes of reverse engineering tasks were selected for analysis: source code analysis, binary code analysis, and data models analysis. In each class of tasks, the most typical tasks were selected taking into account the limitations of ChatGPT regarding the processing of graphical models and the amount of input and output data, and sets of test tasks were developed for each task. As an approach to assessment, an approach similar to the assessment of competencies of higher education applicants after studying the relevant discipline was chosen. The following were considered as criteria for evaluating answers to test tasks: correctness (rightness, coincidence with expectations); completeness (obtaining the final result); accuracy (solving the task without additional questions); reasonableness (availability of explanations and answers to questions). The following scale was used and described for evaluations: excellent, very good, good, satisfactory, sufficient, unsatisfactory. During the testing for each test task, tasks statement and all necessary data were entered through the ChatGPT interface. As a result of the analysis, it was determined that ChatGPT better solves the problems of source code analysis (excellent and very good grades were obtained for semantic and structural analysis, restoration of mathematical support, quality assessment, security audit and refactoring, as well as for conversion to another programming language), tasks of decompilation of IDA pseudocode into complete C source code, tasks of reverse engineering of relational databases, and tasks of generating YARA rules for recognizing file formats. Unsatisfactory grades were obtained for dynamic analysis of assembly code and determination of binary file structures of non-standard formats. ChatGPT solves the rest of the problems well and satisfactorily, but requires checking the results, clarifying queries and prompts, as well as manual error correction in some cases. ChatGPT errors were observed when analyzing binary data represented by hexadecimal symbols, as well as errors in generated scripts for programming in IDA. On the basis of the set grades, conclusions were made regarding the expediency, possibility, or impracticality of using ChatGPT to solve each type of software reverse engineering problems, and appropriate recommendations were also provided. Prospects for further research include testing new versions of ChatGPT and other similar artificial intelligence systems regarding the capabilities of analyzing and synthesizing graphical models of software.

Read full abstract

Identifying free open-source software (FOSS) packages on binaries when the source code is unavailable is important for many security applications, such as malware detection, software infringement, and digital forensics. This capability enhances both the accuracy and the efficiency of reverse engineering tasks by avoiding false correlations between irrelevant code bases. Although the FOSS package identification problem belongs to the field of software engineering, conventional approaches rely strongly on practical methods in data mining and database searching. However, various challenges in the use of these methods prevent existing function identification approaches from being effective in the absence of source code. To make matters worse, the introduction of obfuscation techniques, the use of different compilers and compilation settings, and software refactoring techniques has made the automated detection of FOSS packages increasingly difficult. With very few exceptions, the existing systems are not resilient to such techniques, and the exceptions are not sufficiently efficient. To address this issue, we propose FOSSIL , a novel resilient and efficient system that incorporates three components. The first component extracts the syntactical features of functions by considering opcode frequencies and applying a hidden Markov model statistical test. The second component applies a neighborhood hash graph kernel to random walks derived from control-flow graphs, with the goal of extracting the semantics of the functions. The third component applies z-score to the normalized instructions to extract the behavior of instructions in a function. The components are integrated using a Bayesian network model, which synthesizes the results to determine the FOSS function. The novel approach of combining these components using the Bayesian network has produced stronger resilience to code obfuscation. We evaluate our system on three datasets, including real-world projects whose use of FOSS packages is known, malware binaries for which there are security and reverse engineering reports purporting to describe their use of FOSS, and a large repository of malware binaries. We demonstrate that our system is able to identify FOSS packages in real-world projects with a mean precision of 0.95 and with a mean recall of 0.85. Furthermore, FOSSIL is able to discover FOSS packages in malware binaries that match those listed in security and reverse engineering reports. Our results show that modern malware binaries contain 0.10--0.45 of FOSS packages.

Read full abstract

Reverse Engineering Tasks Research Articles

Related Topics

Articles published on Reverse Engineering Tasks

Analysis of ChatGPT's capabilities for solving problems of reverse-engineering of software

New demand on assembly language proficiency in performing binary reverse engineering tasks

A review of data abstraction.

Malware-on-the-Brain: Illuminating Malware Byte Codes With Images for Malware Classification

SMA-Net: Deep learning-based identification and fitting of CAD models from point clouds

Binary code traceability of multigranularity information fusion from the perspective of software genes

Метод восстановления протокольных автоматов по бинарному коду

Application of Selected Reverse Engineering Procedures Based on Specific Requirements

Adabot: Fault-Tolerant Java Decompiler (Student Abstract)

Systematic Approach to Malware Analysis (SAMA)

CPA: Accurate Cross-Platform Binary Authorship Characterization Using LDA

On the feasibility of binary authorship characterization

An extended framework for knowledge modelling and reuse in reverse engineering projects

Incremental Decompilation of Loop-Free Binary Code: Erlang

FOSSIL

Metrological aspects of reverse engineering of standardized products

Using Sub-Network Combinations to Scale Up an Enumeration Method for Determining the Network Structures of Biological Functions.

Multi-objective reverse engineering of variability-safe feature models based on code dependencies of system variants

A novel method for 3D reconstruction: Division and merging of overlapping B-spline surfaces

Capturing Uncertainty Information and Categorical Characteristics for Network Payload Grouping in Protocol Reverse Engineering

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Reverse Engineering Tasks Research Articles

Related Topics

Articles published on Reverse Engineering Tasks

Analysis of ChatGPT's capabilities for solving problems of reverse-engineering of software

New demand on assembly language proficiency in performing binary reverse engineering tasks

A review of data abstraction.

Malware-on-the-Brain: Illuminating Malware Byte Codes With Images for Malware Classification

SMA-Net: Deep learning-based identification and fitting of CAD models from point clouds

Binary code traceability of multigranularity information fusion from the perspective of software genes

Метод восстановления протокольных автоматов по бинарному коду

Application of Selected Reverse Engineering Procedures Based on Specific Requirements

Adabot: Fault-Tolerant Java Decompiler (Student Abstract)

Systematic Approach to Malware Analysis (SAMA)

CPA: Accurate Cross-Platform Binary Authorship Characterization Using LDA

On the feasibility of binary authorship characterization

An extended framework for knowledge modelling and reuse in reverse engineering projects

Incremental Decompilation of Loop-Free Binary Code: Erlang

FOSSIL

Metrological aspects of reverse engineering of standardized products

Using Sub-Network Combinations to Scale Up an Enumeration Method for Determining the Network Structures of Biological Functions.

Multi-objective reverse engineering of variability-safe feature models based on code dependencies of system variants

A novel method for 3D reconstruction: Division and merging of overlapping B-spline surfaces

Capturing Uncertainty Information and Categorical Characteristics for Network Payload Grouping in Protocol Reverse Engineering