Previous work has shown that taint analyses are only useful if correctly customized to the context in which they are used. Existing domain-specific languages (DSLs) allow such customization through the definition of deny-listing data-flow rules that describe potentially vulnerable or malicious taint-flows. These languages, however, are designed primarily for security experts who are expected to be knowledgeable in taint analysis. Software developers, however, consider these languages to be complex. This paper thus presents fluent TQL, a query specification language particularly for taint-flows. fluentTQL is internal Java DSL and uses a fluent-interface design. fluentTQL queries can express various taint-style vulnerability types, e.g. injections, cross-site scripting or path traversal. This paper describes fluentTQL’s abstract and concrete syntax and defines its runtime semantics. The semantics are independent of any underlying analysis and allows evaluation of fluent TQL queries by a variety of taint analyses. Instantiations of fluentTQL, on top of two taint analysis solvers, Boomerang and FlowDroid, show and validate fluent TQL expressiveness. Based on existing examples from the literature, we have used fluentTQL to implement queries for 11 popular security vulnerability types in Java. Using our SQL injection specification, the Boomerang-based taint analysis found all 17 known taint-flows in the OWASP WebGoat application, whereas with FlowDroid 13 taint-flows were found. Similarly, in a vulnerable version of the Java Spring PetClinic application, the Boomerang-based taint analysis found all seven expected taint-flows. In seven real-world Android apps with 25 expected malicious taint-flows, 18 taint-flows were detected. In a user study with 26 software developers, fluentTQL reached a high usability score. In comparison to CodeQL, the state-of-the-art DSL by Semmle/GitHub, participants found fluentTQL more usable and with it they were able to specify taint analysis queries in shorter time.
Read full abstract