A Study of Call Graph Construction for JVM-Hosted Languages

Karim Ali,Zhaoyi Luo,Julian Dolby,Frank Tip,Xiaoni Lai,Ondrej Lhotak

doi:10.1109/tse.2019.2956925

Abstract

Call graphs have many applications in software engineering, including bug-finding, security analysis, and code navigation in IDEs. However, the construction of call graphs requires significant investment in program analysis infrastructure. An increasing number of programming languages compile to the Java Virtual Machine (JVM), and program analysis frameworks such as WALA and SOOT support a broad range of program analysis algorithms by analyzing JVM bytecode. This approach has been shown to work well when applied to bytecode produced from Java code. In this paper, we show that it also works well for diverse other JVM-hosted languages: dynamically-typed functional Scheme, statically-typed object-oriented Scala, and polymorphic functional OCaml. Effectively, we get call graph construction for these languages for free, using existing analysis infrastructure for Java, with only minor challenges to soundness. This, in turn, suggests that bytecode-based analysis could serve as an implementation vehicle for bug-finding, security analysis, and IDE features for these languages. We present qualitative and quantitative analyses of the soundness and precision of call graphs constructed from JVM bytecodes for these languages, and also for Groovy, Clojure, Python, and Ruby. However, we also show that implementation details matter greatly. In particular, the JVM-hosted implementations of Groovy, Clojure, Python, and Ruby produce very unsound call graphs, due to the pervasive use of reflection, <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">invokedynamic</monospace> instructions, and run-time code generation. Interestingly, the dynamic translation schemes employed by these languages, which result in unsound static call graphs, tend to be correlated with poor performance at run time.

Highlights

T HE Java Virtual Machine (JVM) was designed for portable and efficient implementation of Java
We study soundness and precision of call graphs constructed from JVM bytecode produced from Scheme, Scala, OCaml, Groovy, Clojure, Python, and Ruby programs
WALA and DOOP [38], [39] support invokedynamic and proxies, and SOOT relies on Tamiflex to analyze reflective code. In contrast to these studies, which consider the challenges posed by dynamic language features in the context of synthetic benchmarks, our work has focused on identifying and analyzing the challenges posed to static analysis that arise in the JVM bytecodes produced by compilers for 7 programming languages on programs taken from an existing benchmark suite and from open-source repositories

Summary

Introduction

T HE Java Virtual Machine (JVM) was designed for portable and efficient implementation of Java. The JVM has been used to implement programming languages such as Clojure [1], Groovy [2], OCaml [3], Python [4], Ruby [5], Scala [6], and Scheme [7] By compiling these languages to JVM bytecode, language implementors significantly reduce the amount of work needed to implement their languages, and achieve portability across many platforms. Several frameworks, such as Chord [8], Doop [9], Soot [10], Wala [11], and OPAL [12], have been developed to support static analysis of JVM bytecode. From a static analysis perspective, by denoting a method explicitly, method handles enable more precise analysis than what was possible using the reflective idioms required before Java 7

Methods

Results

Discussion

Conclusion