Bitcoin is the most widely used crypto-currency, and one of the most studied. Thanks to the open nature of the Blockchain, transaction records are freely accessible and can be analyzed by anyone. The first step in most analytics work is to group anonymous addresses into a set of addresses, called aggregates, that are meant to correspond to unique actors. In this paper, we propose new methods to discover more accurate address aggregates using supervised learning. We introduce a way to create a labeled training set based on reliable heuristics and external information, and propose two methods. The first method automatically finds address aggregates from a set of transactions. The second one improves an address aggregate of a target actor by specializing the training for a single actor. We empirically validate our results on large-scale datasets. A striking result of our analysis is that training a model to recognize the change addresses of a particular actor is more efficient than using a larger dataset that does not target that particular actor. In doing so, we clearly show the feasibility and interest of supervised machine learning to identify Bitcoin actors.
Read full abstract