Abstract

Decompilation is the main process of software development, which is very important when a program tries to retrieve lost source codes. Although decompiling Java bytecode is easier than bytecode, many Java decompilers cannot recover originally lost sources, especially the selection statement, i.e., if statement. This deficiency affects directly decompilation performance. In this paper, we propose the methodology for guiding Java decompiler to deal with the aforementioned problem. In the framework, Java bytecode is transformed into two kinds of features called frame feature and latent semantic feature. The former is extracted directly from the bytecode. The latter is achieved by two-step transforming the Java bytecode to bigram and then term frequency-inverse document frequency (TFIDF). After that, both of them are fed to the genetic algorithm to reduce their dimensions. The proposed feature is achieved by converting the selected TFIDF to a latent semantic feature and concatenating it with the selected frame feature. Finally, KNN is used to classify the proposed feature. The experimental results show that the decompilation accuracy is 93.68 percent, which is obviously better than Java Decompiler.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call