With the trend towards ever larger “big data” applications, many of the gains achievable by using specialized compute accelerators become diminished due to the growing I/O overheads. While there have been several research efforts into computational storage and FPGA implementations of the NVMe interface, to our knowledge there have been only very limited efforts to move larger parts of the Linux block I/O stack into FPGA-based hardware accelerators. Our hardware/software framework DeLiBA initially addressed this deficiency by allowing high-productivity development of software components of the I/O stack in user instead of kernel space and leverages a proven FPGA SoC framework to quickly compose and deploy the actual FPGA-based I/O accelerators. In its initial form, it achieves 10% higher throughput and up to 2.3× the I/Os per second (IOPS) for a proof-of-concept Ceph accelerator running in a real multi-node Ceph cluster. In DeLiBA2, we have extended the framework further to better support distributed storage systems, specifically by directly integrating the block I/O accelerators with a hardware-accelerated network stack, as well as by accelerating more storage functions. With these improvements, performance grows significantly: The cluster-level speed-ups now reach up to 2.8× for both throughput and IOPS relative to Ceph in software in synthetic benchmarks, and achieve end-to-end wall clock speed-ups of 20% for the real workload of building a large software package.