Abstract
Call graphs have many applications in software engineering, including bug-finding, security analysis, and code navigation in IDEs. However, the construction of call graphs requires significant investment in program analysis infrastructure. An increasing number of programming languages compile to the Java Virtual Machine (JVM), and program analysis frameworks such as WALA and SOOT support a broad range of program analysis algorithms by analyzing JVM bytecode. This approach has been shown to work well when applied to bytecode produced from Java code. In this paper, we show that it also works well for diverse other JVM-hosted languages: dynamically-typed functional Scheme, statically-typed object-oriented Scala, and polymorphic functional OCaml. Effectively, we get call graph construction for these languages for free, using existing analysis infrastructure for Java, with only minor challenges to soundness. This, in turn, suggests that bytecode-based analysis could serve as an implementation vehicle for bug-finding, security analysis, and IDE features for these languages. We present qualitative and quantitative analyses of the soundness and precision of call graphs constructed from JVM bytecodes for these languages, and also for Groovy, Clojure, Python, and Ruby. However, we also show that implementation details matter greatly. In particular, the JVM-hosted implementations of Groovy, Clojure, Python, and Ruby produce very unsound call graphs, due to the pervasive use of reflection, <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">invokedynamic</monospace> instructions, and run-time code generation. Interestingly, the dynamic translation schemes employed by these languages, which result in unsound static call graphs, tend to be correlated with poor performance at run time.
Highlights
T HE Java Virtual Machine (JVM) was designed for portable and efficient implementation of Java
We study soundness and precision of call graphs constructed from JVM bytecode produced from Scheme, Scala, OCaml, Groovy, Clojure, Python, and Ruby programs
WALA and DOOP [38], [39] support invokedynamic and proxies, and SOOT relies on Tamiflex to analyze reflective code. In contrast to these studies, which consider the challenges posed by dynamic language features in the context of synthetic benchmarks, our work has focused on identifying and analyzing the challenges posed to static analysis that arise in the JVM bytecodes produced by compilers for 7 programming languages on programs taken from an existing benchmark suite and from open-source repositories
Summary
T HE Java Virtual Machine (JVM) was designed for portable and efficient implementation of Java. The JVM has been used to implement programming languages such as Clojure [1], Groovy [2], OCaml [3], Python [4], Ruby [5], Scala [6], and Scheme [7] By compiling these languages to JVM bytecode, language implementors significantly reduce the amount of work needed to implement their languages, and achieve portability across many platforms. Several frameworks, such as Chord [8], Doop [9], Soot [10], Wala [11], and OPAL [12], have been developed to support static analysis of JVM bytecode. From a static analysis perspective, by denoting a method explicitly, method handles enable more precise analysis than what was possible using the reflective idioms required before Java 7
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.