Efficient distribution system state estimation (DSSE) is becoming increasingly important in modern grids. Building the gain matrix is carried out at every iteration of the weighted least squares (WLS) DSSE algorithm that employs the normal equations approach, and the computation of this matrix occupies a substantial portion of the total execution time. This paper presents a two-phase algorithm for the efficient gain matrix calculation suitable for modern CPU architectures. The preparation phase creates the necessary structures for effective data streaming and code execution. The gain matrix is calculated in the second phase using the prepared data and single instruction multiple data (SIMD) intrinsics. The Jacobian measurement matrix is organized as a sparse matrix of small dense blocks with double-precision complex values. The implementation exploits memory-aligned data blocks that require fewer CPU instructions and provide higher memory access speeds, in addition to modern C++ features for utilizing the instruction pipelining technique fully. Numerical results are reported on sizeable distribution networks having up to 3124 multi-phase nodes and 9537 three-phase nodes. They show that the proposed approach for computing the gain matrix is up to three times faster than the sparse matrix–matrix multiplication function in the SuiteSparse package, which makes the procedure useful for the development of real-time DSSE software.