Платформенно-независимый и масштабируемый инструмент поиска клонов кода в бинарных файлах

H.K Aslanyan,M.S Arutunian,S.F Kurmangaleev,S.S Sargsyan,V.G Vardanyan

doi:10.15514/ispras-2016-28(5)-13

Abstract

During the software development developers often copy and paste fragments of code to achieve the desired result. Copying of code can lead to variety of errors, as well as can increase the size of the source and binary code. The problem of finding semantically similar pieces of code (clones) in binary code becomes actual due to the unavailability of source code of many software programs. The first part of the article is dedicated to the analysis of the existing methods for finding code clone in binary code. In the second part we provide a newly developed tool for finding code clones in binary code. The work of the tool is divided into three main stages. The first stage is based on the Binnavi [1] framework, which is responsible for generation of program dependence graphs (PDG). Program dependence graphs are generated using REIL (Reverse Engineering Intermediate Language). The usage of REIL language allows to generate graphs for multiple architectures (x86, x86-64, ARM, MIPS, PPC), thus providing the independence of the tool from the target architecture. In the second step code clones are found based on previously created graphs. Maximum common subgraph is built for each pair of graphs and based on it, code clones are detected. In the third stage, the detected clones are visualized for convenient analysis of the results.

Highlights

The first part of the article is dedicated to the analysis of the existing methods for finding code clone in binary code
In the second part we provide a newly developed tool for finding code clones in binary code
The first stage is based on the Binnavi [1] framework, which is responsible for generation of program dependence graphs (PDG)

Summary

Введение

Существует ряд методов поиска клонов кода, основанный на текстовом [2], лексическом [3], синтаксическом [4, 5, 6] и семантическом [6, 7, 8, 9, 10, 11, 12 13] анализе программы. В основном, все эти методы основаны требуют наличия исходного кода программы. Задача поиска клонов в бинарном коде мало изучена несмотря на то, что она является более важной с точки зрения поиска ошибок в программах, учитывая тот факт, что в основном программы распространяются без исходного кода. Первый тип – фрагменты кода, которые полностью совпадают. Второй тип – фрагменты кода, которые могут отличаться типами, значениями данных именами регистров. Третий тип – фрагменты кода, которые могут отличаться типами, значениями данных именами регистров, а также могут отличаться некоторыми инструкциями (в конкретном фрагменте могут присутствовать или отсутствовать некоторые инструкции). Клон второго типа от конкретного фрагмента отличается распределением регистра ecx в место eax. Клон третьего типа от конкретного фрагмента отличается распределением регистра ecx в место eax и отсутствием одной инструкции (imul eax, ebp+var_4])

Подходы поиска клонов в бинарном коде

Модель инструмента поиска клонов в бинарных файлах

Генерация ГЗП

Разделение ГЗП на подграфы

Анализ ГЗП графов

Фильтрация полученных клонов

Результаты

Дальнейшая работа

Заключение

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the Institute for System Programming of the RAS	Publication Date: Jan 1, 2016
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Платформенно-независимый и масштабируемый инструмент поиска клонов кода в бинарных файлах

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS

Lead the way for us

Similar Papers

To enhance the code clone detection algorithm by using hybrid approach for detection of code clones
Roopam ... Gurpreet Singh
-
Roopam, et. al. Roopam ... Gurpreet Singh
01 Jun 2017
01 Jun 2017

Case study on semantic clone detection based on code behavior
Bayu Priyambadha ... Siti Rochimah
-
Bayu Priyambadha, et. al.Bayu Priyambadha ... Siti Rochimah
01 Nov 2014
01 Nov 2014

Semantic Understanding of Source and Binary Code based on Natural Language Processing
Zhongtang Zhang ... Qichao Yang
-
Zhongtang Zhang, et. al.Zhongtang Zhang ... Qichao Yang
18 Jun 2021
18 Jun 2021

Scalable Framework for Accurate Binary Code Comparison
Hayk Aslanyan ... Vahagn Vardanyan
-
Hayk Aslanyan, et. al.Hayk Aslanyan ... Vahagn Vardanyan
01 Nov 2017
01 Nov 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Платформенно-независимый и масштабируемый инструмент поиска клонов кода в бинарных файлах

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS