Task migration plays an important role in load balancing and energy savings in data centers. It also challenges service providers to minimize service interruptions during task migration. FPGA computing requires checkpointing as an essential function for hardware task migration. However, the current methods of implementing such a function for FPGAs have a high cost in hardware resources and significant degradation in performance. To overcome these problems, in this paper we propose a system using checkpointing at the hardware description language (HDL) level for hardware task migration. First, we propose a hardware task migration scheme in which checkpointing procedures and context transfer can overlap to reduce the service downtime. Second, we present a new checkpointing architecture for FPGAs that flattens the structure of nested modules at the HDL level. Third, we propose a static analysis of the original HDL source code to reduce the cost of hardware. Fourth, we introduce a Python-based tool to generate the checkpointing architecture at the HDL level. We evaluated our checkpointing architecture and the migration scheme using four application benchmarks running on a heterogeneous FPGA cluster. Our evaluations showed that the migration downtime was minimized at only 1.251 ms in the S-Search benchmark. When compared with a tree-based checkpointing architecture, the proposed architecture with the static analysis can reduce the LUT overhead by up to 50%, on the average. The checkpointing hardware caused small degradation in the maximum clock frequency (1.66% on the average), and consumed small memory footprints. Other comparisons with the previous hardware task migration scheme highlight the advantages of our migration scheme.
Read full abstract