Поиск семантических ошибок, возникающих при некорректной адаптации скопированных участков кода

Sevak Sargsyan

doi:10.15514/ispras-2015-27(2)-6

Abstract

The paper describes a method for semantic errors detection arising during incorrect code copy-paste made by the developer. The method consists of two basic parts. The first part detects code clones based on lexical analysis of the program. A sequence of tokens is constructed based on the LLVM lexer and then all pairs of maximal, non-intersected matched token sequences are detected. The pairs of identical subsequences are then partially parsed to retain the constructs allowed by the programming language and to remove the incomplete sequences. When the remaining subsequences are big enough, the second stage is applied for them. A Program Dependence Graph (PDG) is constructed for the corresponding function code, and then identical subsequences’ subgraphs are considered. If two subgraphs have shared vertices, then outgoing edges of these vertices are analyzed. This allows detecting semantic errors with high accuracy. The described method is implemented for the LLVM/Clang compiler. Due to this semantic mistakes are detected during program compile time, so there is no need for separate lexical and semantic program analysis. A number of widely used open source libraries and software systems were analyzed. The paper contains the list of detected semantic errors for Linux kernel 2.6 and Android 4.3. For these systems, the true positive rate achieved by our approach is above 65%.

Highlights

Анализ больших проектов с открытыми исходными кодами показал, что большое количество ошибок возникает из-за неверно адаптированного кода, так, например, репозитории FreeBSD и Linux по данным на 2013 содержали более 113 и 182 исправлений подобных ошибок [13]
Генерация лексем и построение Program Dependence Graph (PDG) проекта производится во время компиляции проекта
Scalable code clone detection tool based on semantic analysis, The Proceedings of ISP RAS, vol 27, issue 1, 2015

Summary

Введение

В процессе разработки программного обеспечения (ПО) часто прибегают к копированию ранее написанного кода, что может стать причиной возникновения семантических ошибок в программе. Инструменты поиска клонов кода и семантических ошибок широко применяются в процессе разработки ПО. Основанный на семантическом анализе программы [9, 10, 11, 12], находит все три типа клонов кода с большой точностью, но у этого подхода большая вычислительная сложность. Что поиск максимально изоморфных подграфов – NP-сложная задача, и для ее решения применяются приближенные алгоритмы, у которых сложность может быть кубической от количества вершин в PDG. Возникающих из-за неправильной адаптации скопированного кода, как правило, используют методы, основанные либо на лексическом анализе, либо на синтаксическом анализе. Недостаток такого подхода заключается в том, что некорректно переименованные переменные влияют на вид AST, поскольку могут появляться/исчезать узлы дерева для выражений Данная работа описывает новый подход нахождения семантических ошибок, возникающих при неправильной адаптации копированного кода. Второй этап строит PDG для этой функции, чтобы найти ошибки, допущенные при копировании

Типы клонов

Модель инструмента поиска семантических ошибок

Поиск клонов кода на основе лексического анализа

Поиск ошибок

Проверка изоморфизма на основе метрик

Результаты

Заключение

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Поиск семантических ошибок, возникающих при некорректной адаптации скопированных участков кода

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS

Lead the way for us

Journal: Proceedings of the Institute for System Programming of the RAS	Publication Date: Jan 1, 2015
License type: cc-by

Similar Papers

Масштабируемый инструмент поиска клонов кода на основе семантического анализа программ
Sevak Sargsyan ... Andrey Belevantsev
Proceedings of the Institute for System Programming of the RAS | VOL. 27
Sevak Sargsyan, et. al.Sevak Sargsyan ... Andrey Belevantsev
01 Jan 2015
Proceedings of the Institute for System Programming of the RAS | VOL. 27

Evolution of Apache Open Source Software
Haoran Wen ... Raissa M. D’Souza
-
Haoran Wen, et. al.Haoran Wen ... Raissa M. D’Souza
01 Jan 2009
01 Jan 2009

Implementation of Lexical Analysis on Assignment Statements in C++ Programming Language
Zaw Lin Oo ... Mya Sandar Kyin
International journal of scientific research in science, engineering and technology | VOL. 7
Zaw Lin Oo, et. al.Zaw Lin Oo ... Mya Sandar Kyin
25 Mar 2020
International journal of scientific research in science, engineering and technology | VOL. 7

Learning programming from erroneous worked-examples. Which type of error is beneficial for learning?
Maik Beege ... Günter Daniel Rey
Learning and Instruction | VOL. 75
Maik Beege, et. al.Maik Beege ... Günter Daniel Rey
30 May 2021
Learning and Instruction | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Поиск семантических ошибок, возникающих при некорректной адаптации скопированных участков кода

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS