Software call stack is a sequence of function calls that are executed during the runtime of a software program. Software call stack analysis (CSA) is widely used in software engineering to analyze the runtime behavior of software, which can be used to optimize the software performance, identify bugs, and profile the software. Despite the benefits of CSA, it has recently come under scrutiny due to concerns about privacy. To date, software is often deployed at user-side devices like mobile phones and smart watches. The collected call stacks may thus contain privacy-sensitive information, such as healthy information or locations, depending on the software functionality. Leaking such information to third parties may cause serious privacy concerns such as discrimination and targeted advertisement. This paper presents PP-CSA, a practical and privacy-preserving CSA framework that can be deployed in real-world scenarios. Our framework leverages local differential privacy (LDP) as a principled privacy guarantee, to mutate the collected call stacks and protect the privacy of individual users. Furthermore, we propose several key design principles and optimizations in the technical pipeline of PP-CSA, including an encoder-decoder scheme to properly enforce LDP over software call stacks, and several client/server-side optimizations to largely improve the efficiency of PP-CSA. Our evaluation over real-world Java and Android programs shows that our privacy-preserving CSA pipeline can achieve high utility and privacy guarantees while maintaining high efficiency. We have released our implementation of PP-CSA as an open-source project at https://github.com/wangzhaoyu07/PP-CSA for results reproducibility. We will provide more detailed documents to support and the usage and extension of the community.
Read full abstract