Abstract

BackgroundBlockchain has emerged as a decentralized and distributed framework that enables tamper-resilience and, thus, practical immutability for stored data. This immutability property is important in scenarios where auditability is desired, such as in maintaining access logs for sensitive healthcare and biomedical data. However, the underlying data structure of blockchain, by default, does not provide capabilities to efficiently query the stored data. In this investigation, we show that it is possible to efficiently run complex audit queries over the access log data stored on blockchains by using additional key-value stores. This paper specifically reports on the approach we designed for the blockchain track of iDASH Privacy & Security Workshop 2018 competition. In this track, participants were asked to devise an efficient way to run conjunctive equality and range queries on a genomic dataset access log trail after storing it in a permissioned blockchain network consisting of 4 identical nodes, each representing a different site, created with the Multichain platform.MethodsMultichain duplicates and indexes blockchain data locally at each node in a key-value store to support retrieval requests at a later point in time. To efficiently leverage the key-value storage mechanism, we applied various techniques and optimizations, such as bucketization, simple data duplication and batch loading by accounting for the required query types of the competition and the interface provided by Multichain. Particularly, we implemented our solution and compared its loading and query-response performance with SQLite, a commonly used relational database, using the data provided by the iDASH 2018 organizers.ResultsDepending on the query type and the data size, the run time difference between blockchain based query-response and SQLite based query-response ranged from 0.2 seconds to 6 seconds. A deeper inspection revealed that range queries were the bottleneck of our solution which, nevertheless, scales up linearly.ConclusionsThis investigation demonstrates that blockchain-based systems can provide reasonable query-response times to complex queries even if they only use simple key-value stores to manage their data. Consequently, we show that blockchains may be useful for maintaining data with auditability and immutability requirements across multiple sites.

Highlights

  • Blockchain has emerged as a decentralized and distributed framework that enables tamper-resilience and, practical immutability for stored data

  • It eliminates the potential for a single point of failure because data is replicated across multiple entities. Due to their immutability and auditability guarantees, blockchains are useful for storing access logs from multiple sites to different datasets

  • We report on the approach we designed for the blockchain track of iDASH Privacy & Security Workshop 2018 competition [6]

Read more

Summary

Introduction

Blockchain has emerged as a decentralized and distributed framework that enables tamper-resilience and, practical immutability for stored data. This paper reports on the approach we designed for the blockchain track of iDASH Privacy & Security Workshop 2018 competition In this track, participants were asked to devise an efficient way to run conjunctive equality and range queries on a genomic dataset access log trail after storing it in a permissioned blockchain network consisting of 4 identical nodes, each representing a different site, created with the Multichain platform. One notable benefit is that a blockchainbased data storage solution mitigates security issues that can arise from malicious administrators It eliminates the potential for a single point of failure because data is replicated across multiple entities. Due to their immutability and auditability guarantees, blockchains are useful for storing access logs from multiple sites to different datasets (e.g., genomic research data). This is because access logs require strong auditability (e.g., auditing accesses to genomic information), transparency (e.g., publicly verifying that certain data is not misused by checking the logs) and tamper-resistance (e.g., preventing an attacker from manipulating the stored logs)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call