Abstract
Bloom filters are a very important tool for many applications including genomics, where they are used as a compact data structure for counting k-mers, represent de Bruijn graphs, and more. Due to their random-access nature coupled with the large size required for genomics, Bloom filters for genomics can easily become bound by the random access performance of off-chip memory. This is especially true for accelerators such as FPGAs and GPUs, which can easily remove the computation overhead of the multiple hash functions. As a result, Bloom filter accelerators have typically focused either on small filters which can fit in fast on-chip memory, or require fast off-chip memory fabric such as Hybrid Memory Cubes. In this work, we present BunchBloomer, which improves the cost-effectiveness of FPGA Bloom filter accelerators by making better use of cheaper, lower-power DDR memory. BunchBloomer uses a multi-layer radix sorter to group table updates into bursts directed to the same 8 KiB memory region, which can be efficiently cached in on-chip memory. A single BunchBloomer device outperforms a costly 12-core server by over 2×, demonstrating an order of magnitude better power efficiency. It even achieves better power efficiency compared to published FPGA Bloom filter accelerators equipped with Hybrid Memory Cubes.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have