Memory disaggregation separates compute (CPU) and main memory resources into disjoint physical units to enable elastic and independent scaling. Connected via high-speed RDMA-enabled networks, compute nodes can directly access remote memory. This setting often requires complex protocols with many network roundtrips as memory nodes have near-zero compute power. In this paper, we design a scalable distributed inverted list index for disaggregated memory architectures. An inverted list index maps a set of terms to lists of documents that contain this term. Current solutions either partition the index horizontally or vertically with severe limitations in the disaggregated memory setting due to data and access skew, high network latency, or out-of-memory errors. Our method partitions lists into fixed-size blocks and spreads them across the memory nodes to balance skewed accesses. Block-based list processing keeps the memory footprint of compute nodes low and masks latency by interleaving remote accesses with expensive list operations. In addition, we propose efficient updates with optimistic concurrency control and read-write conflict detection. Our experiments confirm the efficiency and scalability of our method.
Read full abstract