Presents a systematic approach to the development of message passing programs. Our programming model is SPMD, with communications restricted to collective operations: scan, reduction, gather, etc. The design process in such an architecture-independent language is based on correctness-preserving transformation rules that are provable in a formal functional framework. We develop a set of design rules for composition and decomposition. For example, scan followed by reduction is replaced by a single reduction, and global reduction is decomposed into two faster operations. The impact of the design rules on the target performance is estimated analytically and tested in machine experiments. As a case study, we design two provably correct, efficient programs using the Message Passing Interface (MPI) for the famous maximum segment sum problem, starting from an intuitive, but inefficient, algorithm specification.