Pattern Mining and Detection of Malicious SQL Queries on Anonymization Mechanism

Jianguo Zheng,Xinyu Shen

doi:10.1109/access.2021.3052956

Jianguo Zheng, Xinyu Shen

Open Access

https://doi.org/10.1109/access.2021.3052956

Copy DOI

Abstract

With the striking development of big data, individual privacy and data security obtain unprecedented importance. Database anonymization mechanism is created for protecting individual privacy by adding noise to the result of a query, which finds a tradeoff between the privacy and utility of personal data. However, corresponding attacks are emerging continuously resulting in a high risk of individual identification. In this paper, we learn patterns of malicious SQL queries and propose a novel detection method. Association rules are used to mine patterns and features of noise-exploitation attacks, and parse trees are applied to the feature extraction of SQL, thereby we construct feature vectors and input them into the classifiers. At the same time, we also propose a SQL generation method to generate query samples based on a real database for model training and testing. Experiments show that our detection method can significantly prevent noise-exploitation attacks including almost all differential attacks and 91% cloning attacks based on the synthetic dataset, which ensures a strong degree of data utility.

Highlights

In recent years, people embrace an era of data explosion
When the data analyst sends a request, Diffix will convert it and send it to the database, the database will refer to the real query data table of Diffix
The proposer claims that Diffix is an alternative to differential privacy, which can provide users with unlimited queries, unlimited SQL query semantics, and minimal noise addition

Summary

INTRODUCTION

People embrace an era of data explosion. A large amount of data is being generated, transmitted, and collected at all times, the connection between individuals and data is getting intimate. Under Diffix, because it has made its noise mechanism public, the added noise satisfies the Gaussian distribution Through this prior knowledge, the attacker constructs relevant differential query combinations, analyzes the probability distribution of the returned results, and can obtain the target with a high probability of the attribute value of the individual. After the data analyst submits the query request, the detection filter layer sends the processed SQL statement to the database, and after obtaining the relevant data, it classifies the target query SQL statement If it is a normal query, the query will be returned to the data analyst with noisy results through the anonymization mechanism; if it is malicious, the query will be prohibited. Combined with the results of the association rules, it again confirmed that malicious query attacks follow a certain pattern

COLLECTION AND PREPROCESSING OF SQL

Findings

VIII. CONCLUSION