Automatic Modeling of Opaque Code for JavaScript Static Analysis

Joonyoung Park,Alexander Jordan,Sukyoung Ryu

doi:10.1007/978-3-030-16722-6_3

Abstract

Static program analysis often encounters problems in analyzing library code. Most real-world programs use library functions intensively, and library functions are usually written in different languages. For example, static analysis of JavaScript programs requires analysis of the standard built-in library implemented in host environments. A common approach to analyze such opaque code is for analysis developers to build models that provide the semantics of the code. Models can be built either manually, which is time consuming and error prone, or automatically, which may limit application to different languages or analyzers. In this paper, we present a novel mechanism to support automatic modeling of opaque code, which is applicable to various languages and analyzers. For a given static analysis, our approach automatically computes analysis results of opaque code via dynamic testing during static analysis. By using testing techniques, the mechanism does not guarantee sound over-approximation of program behaviors in general. However, it is fully automatic, is scalable in terms of the size of opaque code, and provides more precise results than conventional over-approximation approaches. Our evaluation shows that although not all functionalities in opaque code can (or should) be modeled automatically using our technique, a large number of JavaScript built-in functions are approximated soundly yet more precisely than existing manual models.

Highlights

Static analysis is widely used to optimize programs and to find bugs in them, but it often faces difficulties in analyzing library code
We experimentally show that this simple heuristic works well for automatic modeling of JavaScript builtin functions
We present a Sample-Run-Abstract approach (⇓SRA) as a promising way to perform static analysis in the presence of opaque code using automated on-demand modeling

Summary

Introduction

Static analysis is widely used to optimize programs and to find bugs in them, but it often faces difficulties in analyzing library code. A conventional approach to analyze such opaque code is for analysis developers to create models that provide the analysis results of the opaque code. Models approximate the behaviors of opaque code, they are often tightly integrated with specific static analyzers to support precise abstract semantics that are compatible with the analyzers’ internals. Various approaches have been proposed to model opaque code automatically. They create models either from specifications of the code’s behaviors [2,26] or using dynamic information during execution of the code [8,9,22]. The former approach heavily depends on the quality and format of available specifications, and the latter approach is limited to the capability of instrumentation or specific analyzers

Methods

Results

Conclusion