Abstract

ABSTRACTNational Statistical Agencies and other data custodian agencies hold a wealth of data regarding individuals and organizations, collected from censuses, surveys and administrative sources. In many cases, these data are made available to external researchers, for the investigation of questions of social and economic importance. To enhance access to this information, several national statistical agencies are developing remote analysis systems (RAS) designed to accept queries from a researcher, run them on data held in a secure environment, and then return the results. RAS prevent a researcher from accessing the underlying data, and most rely on manual checking to ensure the responses have acceptably low disclosure risk. However, the need for scalability and consistency will increasingly require automated methods. We propose a RAS output confidentialization procedure based on statistical bootstrapping that automates disclosure control while achieving a provably good balance between disclosure risk and usefulness of the responses. The bootstrap masking mechanism is easy to implement for most statistical queries, yet the characteristics of the bootstrap distribution assure us that it is also effective in providing both useful responses and low disclosure risk. Interestingly, our proposed bootstrap masking mechanism represents an ideal application of Efron's bootstrap—one that takes advantage of all the theoretical properties of the bootstrap, without ever having to construct the bootstrap distribution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call