Binary Code Analysis Research Articles

The article presents the results of the analysis and evaluation of the capabilities of the open version of the chatbot with artificial intelligence ChatGPT 3.5 for solving typical problems of software reverse engineering. Three classes of reverse engineering tasks were selected for analysis: source code analysis, binary code analysis, and data models analysis. In each class of tasks, the most typical tasks were selected taking into account the limitations of ChatGPT regarding the processing of graphical models and the amount of input and output data, and sets of test tasks were developed for each task. As an approach to assessment, an approach similar to the assessment of competencies of higher education applicants after studying the relevant discipline was chosen. The following were considered as criteria for evaluating answers to test tasks: correctness (rightness, coincidence with expectations); completeness (obtaining the final result); accuracy (solving the task without additional questions); reasonableness (availability of explanations and answers to questions). The following scale was used and described for evaluations: excellent, very good, good, satisfactory, sufficient, unsatisfactory. During the testing for each test task, tasks statement and all necessary data were entered through the ChatGPT interface. As a result of the analysis, it was determined that ChatGPT better solves the problems of source code analysis (excellent and very good grades were obtained for semantic and structural analysis, restoration of mathematical support, quality assessment, security audit and refactoring, as well as for conversion to another programming language), tasks of decompilation of IDA pseudocode into complete C source code, tasks of reverse engineering of relational databases, and tasks of generating YARA rules for recognizing file formats. Unsatisfactory grades were obtained for dynamic analysis of assembly code and determination of binary file structures of non-standard formats. ChatGPT solves the rest of the problems well and satisfactorily, but requires checking the results, clarifying queries and prompts, as well as manual error correction in some cases. ChatGPT errors were observed when analyzing binary data represented by hexadecimal symbols, as well as errors in generated scripts for programming in IDA. On the basis of the set grades, conclusions were made regarding the expediency, possibility, or impracticality of using ChatGPT to solve each type of software reverse engineering problems, and appropriate recommendations were also provided. Prospects for further research include testing new versions of ChatGPT and other similar artificial intelligence systems regarding the capabilities of analyzing and synthesizing graphical models of software.

Read full abstract

We prove that, for the binary erasure channel (BEC), the polar-coding paradigm gives rise to codes that not only approach the Shannon limit but do so under the best possible scaling of their block length as a function of the gap to capacity. This result exhibits the first known family of binary codes that attain both optimal scaling and quasi-linear complexity of encoding and decoding. Our proof is based on the construction and analysis of binary polar codes with large kernels. When communicating reliably at rates within ε > 0 of capacity, the code length n often scales as O(1/ε μ ), where the constant μ is called the scaling exponent. It is known that the optimal scaling exponent is μ = 2, and it is achieved by random linear codes. The scaling exponent of conventional polar codes (based on the 2×2 kernel) on the BEC is μ = 3.63. This falls far short of the optimal scaling guaranteed by random codes. Our main contribution is a rigorous proof of the following result: for the BEC, there exist l × l binary kernels, such that polar codes constructed from these kernels achieve scaling exponent μ( l ) that tends to the optimal value of 2 as l grows. We furthermore characterize precisely how large l needs to be as a function of the gap between μ( l ) and 2. The resulting binary codes maintain the recursive structure of conventional polar codes, and thereby achieve construction complexity O(n) and encoding/decoding complexity O(nlogn).

Read full abstract

Binary Code Analysis Research Articles

Related Topics

Articles published on Binary Code Analysis

Pitfalls in Machine Learning for Computer Security

Classification of malware for security improvement in IoT using heuristic aided adaptive multi-scale and dilated ResneXt with gated recurrent unit

Analysis of ChatGPT's capabilities for solving problems of reverse-engineering of software

An Inclusive Report on Robust Malware Detection and Analysis for Cross-Version Binary Code Optimizations

Binary Code Representation With Well-Balanced Instruction Normalization

Return Instruction Classification in Binary Code Using Machine Learning

Efficient Binary Static Code Data Flow Analysis Using Unsupervised Learning

Disassemble Byte Sequence Using Graph Attention Network

MCFT-CNN: Malware classification with fine-tune convolution neural networks using traditional and transfer learning in Internet of Things

Binary Linear Codes With Optimal Scaling: Polar Codes With Large Kernels

Практическая абстрактная интерпретация бинарного кода

X64Unpack: Hybrid Emulation Unpacker for 64-bit Windows Environments and Detailed Analysis Results on VMProtect 3.4

Hybrid Neural Network Model for Protection of Dynamic Cyber Infrastructure

Next-Generation Intermediate Representations for Binary Code Analysis

Parallelization of Implementations of Purely Sequential Algorithms

Machine Learning-Based Analysis of Program Binaries: A Comprehensive Study

Декодирование машинных команд в задаче абстрактной интерпретации бинарного кода

Automatic Detection and Bypassing of Anti-Debugging Techniques for Microsoft Windows Environments

О новом поколении промежуточных представлений, применяемом для анализа бинарного кода

Платформа межпроцедурного статического анализа бинарного кода

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Binary Code Analysis Research Articles

Related Topics

Articles published on Binary Code Analysis

Pitfalls in Machine Learning for Computer Security

Classification of malware for security improvement in IoT using heuristic aided adaptive multi-scale and dilated ResneXt with gated recurrent unit

Analysis of ChatGPT's capabilities for solving problems of reverse-engineering of software

An Inclusive Report on Robust Malware Detection and Analysis for Cross-Version Binary Code Optimizations

Binary Code Representation With Well-Balanced Instruction Normalization

Return Instruction Classification in Binary Code Using Machine Learning

Efficient Binary Static Code Data Flow Analysis Using Unsupervised Learning

Disassemble Byte Sequence Using Graph Attention Network

MCFT-CNN: Malware classification with fine-tune convolution neural networks using traditional and transfer learning in Internet of Things

Binary Linear Codes With Optimal Scaling: Polar Codes With Large Kernels

Практическая абстрактная интерпретация бинарного кода

X64Unpack: Hybrid Emulation Unpacker for 64-bit Windows Environments and Detailed Analysis Results on VMProtect 3.4

Hybrid Neural Network Model for Protection of Dynamic Cyber Infrastructure

Next-Generation Intermediate Representations for Binary Code Analysis

Parallelization of Implementations of Purely Sequential Algorithms

Machine Learning-Based Analysis of Program Binaries: A Comprehensive Study

Декодирование машинных команд в задаче абстрактной интерпретации бинарного кода

Automatic Detection and Bypassing of Anti-Debugging Techniques for Microsoft Windows Environments

О новом поколении промежуточных представлений, применяемом для анализа бинарного кода

Платформа межпроцедурного статического анализа бинарного кода