Abstract
Ability to perform fast analysis on massive public blockchain transaction data is needed in various applications such as tracing fraudulent financial transactions. The blockchain data is continuously growing and is organized as a sequence of blocks containing transactions. This organization, however, cannot be used for parallel graph algorithms which need efficient distributed graph data structures. Using message passing libraries (MPI), we develop a scalable cluster-based system that constructs a distributed transaction graph in parallel and implement various transaction analysis algorithms. We report performance results from our system operating on roughly 5 years of 10.2 million block Ethereum Mainnet blockchain data. We report timings obtained from tests involving distributed transaction graph construction, partitioning, page ranking of addresses, degree distribution, token transaction counting, connected components finding and our new parallel blacklisted address trace forest computation algorithm on a 16 node economical cluster set up on the Amazon cloud. Our system is able to construct a distributed graph of 766 million transactions in 218 s and compute the forest of blacklisted address traces in 32 s.
Highlights
Public blockchain platforms that operate autonomously under the control of no one have become popular globally
Using message passing libraries (MPI), we develop a scalable cluster-based system that constructs a distributed transaction graph in parallel and implement various transaction analysis algorithms
We report timings obtained from tests involving distributed transaction graph construction, partitioning, page ranking of addresses, degree distribution, token transaction counting, connected components finding and our new parallel blacklisted address trace forest computation algorithm on a 16 node economical cluster set up on the Amazon cloud
Summary
Public blockchain platforms that operate autonomously under the control of no one have become popular globally. A system that performs fast tracing fraudulent activities on massive public blockchain transaction data is needed in the field of finance. This need has led to the emergence of firms such as the Chainalysis [3] that is highly valued or the CipherTrace that has recently been acquired [4]. All these developments provide evidence that scalable and parallel systems will be needed that can analyze big blockchain graph transaction data in the near future This is the problem that is addressed in this paper.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have