Information Leakage in Cloud Data Warehouses

Mohammad Ahmadian,Dan C Marinescu

doi:10.1109/tsusc.2018.2838520

Abstract

Information leakage is the inadvertent disclosure of sensitive information through correlation of records from several databases/collections of a cloud data warehouse. Malicious insiders pose a serious threat to cloud data security and this justifies the focus on information leakage due to rogue employees or to outsiders using the credentials of legitimate employees. The discussion in this paper is restricted to NoSQL databases with a flexible schema. Data encryption can reduce information leakage, but it is impractical to encrypt large databases and/or all fields of database documents. Encryption limits the operations that can be carried on the data in a database. It is thus, critical to distinguish sensitive documents in a data warehouse and concentrate on efforts to protect them. The capacity of a leakage channel introduced in this work quantifies the intuitively obvious means to trigger alarms when an insider attacker uses excessive computer resources to correlate information in multiple databases. The Sensitivity Analysis based on Data Sampling (SADS) introduced in this paper balances the trade-offs between higher efficiency in identifying the risks posed by information leakage and the accuracy of the results obtained by sampling very large collections of documents. The paper reports on experiments assessing the effectiveness of SADS and the use of selective disinformation to limit information leakage. Cloud services identifying sensitive records and reducing the risk of information leakage are also discussed.

Full Text