Abstract Distributed in-memory databases are widely adopted to achieve low latency and high bandwidth for data-intensive applications. They support scale-out by sharding and distributing data across multiple nodes. To efficiently adapt to various workloads, distributed in-memory databases must be capable of migrating shards across nodes. In this paper, we demonstrate that state-of-the-art approaches experience significant performance degradation during migration due to service downtime and redundant data transfer. Furthermore, our findings indicate that the presence of service downtime constrains the scalability of migration strategies, while the transfer of redundant data during the snapshot transfer phase limits their adaptability to dynamic workloads. To this end, this paper proposes Aion, a live migration strategy designed for distributed in-memory databases. Aion eliminates any potential service downtime by immediately switching transaction routing to the destination node. To ensure data consistency between the source and destination nodes, as well as serializable execution during migration, Aion proposes the mutual validation phase. Moreover, Aion introduces an analysis phase before the snapshot transfer phase to identify dynamically changing hotspots in workloads. The analysis phase identifies and transfers tuples and versions accessed less frequently to the destination node, reducing the amount of data transferred. Aion is implemented on a distributed in-memory database and evaluated using various OLTP workloads. The results demonstrate that Aion can fundamentally eliminate service downtime, adapt effectively to various workloads and exhibit robust scalability. Compared to state-of-the-art approaches, Aion achieves up to 2.25x–6.57x higher throughput during migration and shortens the migration duration by 53.7–68.2%.
Read full abstract