This paper addresses the problem of IoT security caused by code cloning when developing a massive variety of different smart devices. A clone detection method is proposed to identify clone-caused vulnerabilities in IoT software. A hybrid solution combines syntactic and semantic analyses of the code. Based on the recovered code, an attributed abstract syntax tree is constructed for each code fragment. All nodes of the commonly used abstract syntax tree are proposed to be weighted with semantic attribute vectors. Each attributed tree is then encoded as a semantic vector using a Deep Graph Neural Network. Two graph networks are combined into a Siamese neural model, allowing training to generate semantic vectors and compare vector pairs within each training epoch. Semantic analysis is also applied to clones with low similarity metric values. This allows one to correct the similarity decision in the case of incorrect matching of functions at the syntactic level. To automate the search for clones, the BinDiff algorithm is added in the first stage to accurately select clone candidates. This has a positive impact on the ability to apply the proposed method to large sets of binary code. In an experimental study, the developed method—compared to BinDiff, Gemini, and Asteria tools—has demonstrated the highest efficiency.
Read full abstract