MDS matrices are important building blocks providing diffusion functionality for the design of many symmetric-key primitives. In recent years, continuous efforts are made on the construction of MDS matrices with small area footprints in the context of lightweight cryptography. Just recently, Duval and Leurent (ToSC 2018/FSE 2019) reported some 32 × 32 binary MDS matrices with branch number 5, which can be implemented with only 67 XOR gates, whereas the previously known lightest ones of the same size cost 72 XOR gates.In this article, we focus on the construction of lightweight involutory MDS matrices, which are even more desirable than ordinary MDS matrices, since the same circuit can be reused when the inverse is required. In particular, we identify some involutory MDS matrices which can be realized with only 78 XOR gates with depth 4, whereas the previously known lightest involutory MDS matrices cost 84 XOR gates with the same depth. Notably, the involutory MDS matrix we find is much smaller than the AES MixColumns operation, which requires 97 XOR gates with depth 8 when implemented as a block of combinatorial logic that can be computed in one clock cycle. However, with respect to latency, the AES MixColumns operation is superior to our 78-XOR involutory matrices, since the AES MixColumns can be implemented with depth 3 by using more XOR gates.We prove that the depth of a 32 × 32 MDS matrix with branch number 5 (e.g., the AES MixColumns operation) is at least 3. Then, we enhance Boyar’s SLP-heuristic algorithm with circuit depth awareness, such that the depth of its output circuit is limited. Along the way, we give a formula for computing the minimum achievable depth of a circuit implementing the summation of a set of signals with given depths, which is of independent interest. We apply the new SLP heuristic to a large set of lightweight involutory MDS matrices, and we identify a depth 3 involutory MDS matrix whose implementation costs 88 XOR gates, which is superior to the AES MixColumns operation with respect to both lightweightness and latency, and enjoys the extra involution property.