Extreme-scale scientific collaborations require high-performance wide-area end-to-end data transports to enable fast and secure transfer of high data volumes among collaborating institutions. GridFTP is the de facto protocol for large-scale data transfer in science environments. Existing predominant network transport protocols such as TCP have serious limitations that consume significant CPU power and prevent GridFTP from achieving high throughput on long-haul networks with high latency and potential packet loss, reordering and jitter. On the other hand, protocols such as UDT that address some of the TCP shortcomings demand high computing resources on data transfer nodes. These limitations have caused underutilization of existing high-bandwidth links in scientific and collaborative grids. To address this situation, we have enhanced Globus GridFTP, the most widely used GridFTP implementation, by developing transport offload engines such as UDT and iWARP on SmartNIC, a programmable 10GbE network interface card (NIC). Our results show significant reduction in server utilization and full line-rate sustained bandwidth in high-latency networks, as measured for up to 100 ms of network latency. In our work, we also offload OpenSSL on SmartNIC to reduce host utilization for secure file transfers. The offload engine can provide line-rate data channel encryption/decryption on top of UDT offload without consuming additional host CPU resources. Lower CPU utilization leads to increased server capacity, which allows data transfer nodes to support higher network and data-processing rates. Alternatively, smaller or fewer DTNs can be used for a particular data rate requirement.
Read full abstract