Collective communication, namely the pattern allreduce in message-passing systems, is optimised based on measurements at the installation time of the library. The algorithms used are set up in an initialisation phase of the communication, as so-called persistent collective communication, introduced in the message-passing interface (MPI) standard. Part of our allreduce algorithms are the patterns reduce_scatter and allgatherv which are also considered standalone. For the allreduce pattern for short messages the existing cyclic shift algorithm (Bruck’s algorithm) is applied with a prefix operation. For allreduce and long messages our algorithm is based on reduce_scatter and allgatherv, where the cyclic shift algorithm is applied with a flexible number of communication ports per node. The algorithms for equal message sizes are used with non-equal message sizes together with a heuristic for rank reordering. Medium message sizes are communicated with an incomplete reduce_scatter followed by allgatherv. Furthermore, an optional recursive application of the cyclic shift algorithm is applied. All algorithms are applied at the node level. The data is gathered and scattered by the cores within the node and the communication algorithms are applied across the nodes. In general, our approach outperforms the non-persistent counterpart in established MPI libraries by up to one order of magnitude or shows equal performance, with a few exceptions of number of nodes and message sizes.