Abstract

We propose SiCaGCN, a learning system to predict the similarity of a given software code to a set of codes that are permitted to run on a computational resource, such as a supercomputer or a cloud server. This code characterization allows us to detect abusive codes. Our system relies on a structural analysis of the control-flow graph of the software codes and two different graph similarity measures: Graph Edit Distance (GED) and a singular values based metric. SiCaGCN combines elements of Graph Convolutional Neural Networks (GCN), Capsule networks, attention mechanism, and neural tensor networks. Our experimental results include a study of the trade-offs between the two similarity metrics and two variations of our learning networks, with and without the use of capsules. Our main findings are that the use of capsules reduces mean square error significantly for both similarity metrics. Use of capsules reduces the runtime to calculate the GED while increases the runtime of singular values calculation.

Highlights

  • In the era of exascale computing code characterization is extremely important for super-computing centers and cloud vendors

  • Our contributions are as follows: we introduce a new Graph Convolutional Neural Networks (GCN) architecture that captures different latent properties of programs using capsules; we prepare a new code dataset composed of various C/C++ programs; third, we apply SiCaGCN to the code dataset to produce a similarity metric between a pair of control-flow graphs of basic blocks that can be extended to other graphical datasets as well; we compare the two similarity metrics to observe which helps to differentiate codes efficiently

  • EXPERIMENTS While SiCaGCN can in principle be used for a large set of graph similarity learning problems, we conducted experiments on using the SiCaGCN approach for code characterization, which we describe in more detail

Read more

Summary

INTRODUCTION

In the era of exascale computing code characterization is extremely important for super-computing centers and cloud vendors. To build high-quality graph embeddings, the properties of nodes with respect to the graph along with the structures around a node play an important role We use these graph embeddings to calculate similarity metrics between graphs and characterize the codes. We apply our approach to find the abusive use of compute resources, where we successfully detect the applications that run bitcoin mining algorithms on DOE resources. Our contributions are as follows: we introduce a new GCN architecture that captures different latent properties of programs using capsules; we prepare a new code dataset composed of various C/C++ programs; third, we apply SiCaGCN to the code dataset to produce a similarity metric between a pair of control-flow graphs of basic blocks that can be extended to other graphical datasets as well; we compare the two similarity metrics to observe which helps to differentiate codes efficiently. The rest of the paper is organized as follows: section II describes the existing work; section III presents the SiCaGCN architecture; section IV explains the experimental results and section V concludes and recommends future directions

BACKGROUND
GLOBAL GRAPH CAPSULES FORMATION MODULE
SCORING MODULE
EXPERIMENTS
EVALUATION METRICS
CASE STUDY
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.