A Hypothesis Testing Approach to Sharing Logs with Confidence

Yunhui Long,Carl A Gunter,Le Xu

doi:10.1145/3374664.3375743

Abstract

Logs generated by systems and applications contain a wide variety of heterogeneous information that is important for performance profiling, failure detection, and security analysis. There is a strong need for sharing the logs among different parties to outsource the analysis or to improve system and security research. However, sharing logs may inadvertently leak confidential or proprietary information. Besides sensitive information that is directly saved in logs, such as user-identifiers and software versions, indirect evidence like performance metrics can also lead to the leakage of sensitive information about the physical machines and the system. In this work, we introduce a game-based definition of the risk of exposing sensitive information through released logs. We propose log indistinguishability, a property that is met only when the logs leak little information about the protected sensitive attributes. We design an end-to-end framework that allows a user to identify risk of information leakage in logs, to protect the exposure with log redaction and obfuscation, and to release the logs with a much lower risk of exposing the sensitive attribute. Our framework contains a set of statistical tests to identify violations of the log indistinguishability property and a variety of obfuscation methods to prevent the leakage of sensitive information. The framework views the log-generating process as a black-box and can therefore be applied to different systems and processes. We perform case studies on two different types of log datasets: Spark event log and hardware counters. We show that our framework is effective in preventing the leakage of the sensitive attribute with a reasonable testing time and an acceptable utility loss in logs.

Full Text