SummaryThis paper presents a summary of the current state of research on reducing the risk of disclosure related to what may be called “non‐traditional” outputs for statistical agencies. Whereas traditional outputs include frequency tables, magnitude tables and public use microdata files, non‐traditional outputs include outputs associated with user‐defined exploratory data analysis and statistical modelling offered through a remote analysis system. In remote analysis, a system accepts a query from an analyst, runs it on data held in a secure environment, and then returns the results to the analyst. There is a considerable current interest in fully automated remote analysis systems, because these have the potential to enable agencies to respond to growing researcher demand for more and more detailed data. In practice, a range of protective measures is most effective in remote analysis, and the choice of this range depends heavily on the context including the regulatory environment, the dataset itself, and the purpose of the access.This paper provides a summary of known attack methods on remote analysis system outputs, focussing on exploratory data analysis and linear regression. The paper also summarizes the associated suggested protective measures designed to prevent disclosures and thwart attacks in fully automated remote analysis systems. Some commentary on the attacks and measures is provided.
Read full abstract