Abstract

BXI, Bull eXascale Interconnect, is the new interconnection network developed by Atos for high-performance computing. It has been designed to meet the requirements of exascale supercomputers. At such scale, faults have to be expected and dealt with transparently so that applications remain unaffected by them. BXI features various mechanisms for this purpose, one of which is based on a clear separation between two modes of routing tables computation: offline mode used during bring-up and online mode used to deal with link failures and recoveries. This new architecture is presented along with several offline and online routing algorithms and their actual performance: the full routing tables for a 64k-node fat-tree can be computed in a few minutes in offline mode; and the online mode can withstand numerous inter-router link failures without any noticeable impact on running applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call