Systolic Array (SA) architectures are well-suited for accelerating matrix multiplications through the use of a pipelined array of Processing Elements (PEs) communicating with local connections and pre-orchestrated data movements. Even though most of the dynamic power consumption in SAs is due to multiplications and additions, pipelined data movement within the SA constitutes an additional important contributor. The goal of this work is to reduce the dynamic power consumption associated with the feeding of data to the SA, by employing both dynamic (run-time) and static (offline) techniques. At the hardware level, the proposed architecture synergistically applies bus-invert coding and zero-value clock gating. By exploiting salient attributes of state-of-the-art CNNs, such as the value distribution of the weights, the proposed SA applies appropriate encoding only to the data that exhibits high switching activity. Similarly, when one of the inputs is zero, unnecessary operations are entirely skipped. In addition to this duet of run-time techniques, the proposed methodology also leverages the inherent property of the weight matrix to remain unchanged throughout the inference phase. As such, the weight matrix is appropriately reordered offline to minimize the switching activity between consecutive values, as the matrix is repeatedly loaded into the array. The weight reordering process is formulated as a Traveling Salesman Problem (TSP) and its solution is translated into a switching-activity-aware row permutation of the weight matrix. The symbiotic combination of selectively targeted, application-aware dynamic encoding and offline weight reordering is demonstrated to reduce the switching activity by 38%, on average. This translates to an overall dynamic power reduction of 17.1%–23% when executing state-of-the-art CNN layers on an SA of size 32 × 32. These power savings scale with the array size; for an array of size 64 × 64, the proposed design consumes 29.7%–35.4% less power.