Topology-Inspired Method Recovers Obfuscated Term Information From Induced Software Call-Stacks

Kelly Maggs,Vanessa Robins

doi:10.3389/fams.2021.668082

Abstract

Fuzzing is a systematic large-scale search for software vulnerabilities achieved by feeding a sequence of randomly mutated input files to the program of interest with the goal being to induce a crash. The information about inputs, software execution traces, and induced call stacks (crashes) can be used to pinpoint and fix errors in the code or exploited as a means to damage an adversary’s computer software. In black box fuzzing, the primary unit of information is the call stack: a list of nested function calls and line numbers that report what the code was executing at the time it crashed. The source code is not always available in practice, and in some situations even the function names are deliberately obfuscated (i.e., removed or given generic names). We define a topological object called the call-stack topology to capture the relationships between module names, function names and line numbers in a set of call stacks obtained via black-box fuzzing. In a proof-of-concept study, we show that structural properties of this object in combination with two elementary heuristics allow us to build a logistic regression model to predict the locations of distinct function names over a set of call stacks. We show that this model can extract function name locations with around 80% precision in data obtained from fuzzing studies of various linux programs. This has the potential to benefit software vulnerability experts by increasing their ability to read and compare call stacks more efficiently.

Highlights

A black-box fuzzing campaign is one conducted without explicit knowledge of the source code or its intermediate representations
The results show that the inclusion of call-stack topology features significantly improves the quality of prediction across terms when compared with the null model
Line Numbers are likely to be in larger equivalence classes, with a low weighted out-degree. This means terms are unlikely to depend on line numbers

Summary

Introduction

A black-box fuzzing campaign is one conducted without explicit knowledge of the source code or its intermediate representations. Methods in this area require a brute-force generation of inputs. This can lead to masses of crashes where many are duplicates of one another. The call-stack is a record of the nested functions traced out by the program in its final moments and is one of the few pieces of information available to us when analyzing blackbox fuzzing. The lines in the call-stack are called frames, and while contingent on the operating system’s debugging syntax, decompose roughly into three columns: 1) the module (or filename), 2)

Objectives

Methods

Results

Conclusion