Compact Filters for Fast Online Data Partitioning

Qing Zheng,Charles D Cranor,Gregory R Ganger,Garth A Gibson,Gary Grider,Bradley W Settlemyer,Ankush Jain,George Amvrosiadis

doi:10.1109/cluster.2019.8890992

Abstract

We are approaching a point in time when it will be infeasible to catalog and query data after it has been generated. This trend has fueled research on in-situ data processing (i.e. operating on data as it is streamed to storage). One important example of this approach is in-situ data indexing. Prior work has shown the feasibility of indexing at scale as a two-step process. First, one partitions data by key across the CPU cores of a parallel job. Then each core indexes its subset as data is persisted. Online partitioning requires transferring data over the network so that it can be indexed and stored by the core responsible for the data. This approach is becoming increasingly costly as new computing platforms emphasize parallelism instead of individual core performance that is crucial for communication libraries and systems software in general. In addition to indexing, scalable online data partitioning is also useful in other contexts such as load balancing and efficient compression.We present FilterKV, an efficient data management scheme for fast online data partitioning of key-value (KV) pairs. FilterKV reduces the total amount of data sent over the network and to storage. We achieve this by: (a) partitioning pointers to KV pairs instead of the KV pairs themselves and (b) using a compact format to represent and store KV pointers. Results from LANL show that FilterKV can reduce total write slowdown (including partitioning overhead) by up to 3x across 4096 CPU cores.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Compact Filters for Fast Online Data Partitioning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Isle-Tree: A B+-Tree with Intra-Cache Line Sorted Leaves for Non-volatile Memory
Chundong Wang ... Sudipta Chattopadhyay
-
Chundong Wang, et. al.Chundong Wang ... Sudipta Chattopadhyay
01 Oct 2020
01 Oct 2020

Circ-Tree: A B+-Tree Variant With Circular Design for Persistent Memory
Chundong Wang ... Gunavaran Brihadiswarn
IEEE Transactions on Computers | VOL. 71
Chundong Wang, et. al.Chundong Wang ... Gunavaran Brihadiswarn
21 Oct 2020
IEEE Transactions on Computers | VOL. 71

BloomStore: Bloom-Filter based memory-efficient key-value store for indexing of data deduplication on flash
Guanlin Lu ... David H C Du
-
Guanlin Lu, et. al.Guanlin Lu ... David H C Du
01 Apr 2012
01 Apr 2012

The Matrix KV Storage System Based on NVM Devices.
Tao Cai ... Jie Wang
Micromachines | VOL. 10
Tao Cai, et. al.Tao Cai ... Jie Wang
27 May 2019
Micromachines | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Compact Filters for Fast Online Data Partitioning

Abstract

Talk to us

Similar Papers