Energy and performance improvements in stencil computations are relevant for both application developers and data center administrators. They appear as the fundamental scheme in many large-scale scientific simulations and workloads. Many research efforts have focused on some estimation techniques of the energy usage of HPC systems based on specific characteristics of parallel applications. In case of stencils, we have previously concentrated on detailed estimations of energy consumption and the energy-aware distribution of stencil computations on heterogeneous processors. However, we have restricted our comprehensive studies to a single heterogeneous computing node only. In this paper, we show how scheduling and optimization techniques can be applied for energy and performance improvements of stencil computations on multi-node HPC systems using different network topologies. We formulate a scheduling model together with a new Tabu Search algorithm, called Task Movement (TM), taking into account the communication hierarchies, to minimize the overall energy usage and the execution time of stencil computations. Experimental studies show that this algorithm solves the considered problem more efficiently comparing to other, simpler heuristics. We present computational experiments for a reference 7 point stencil computation pattern on three commonly used low-diameter network topologies: Fat-tree, Dragonfly, and Torus. According to our studies, the most promising multi-node HPC architecture for stencil computations is based on the Torus network concept. Finally, we argue that the proposed scheduling model and TM algorithm can be easily adopted within existing high-level parallel execution environments for stencils automatic performance tuning.