The problem of transmitting a common message to multiple users over the Gaussian multiple-input multiple-output broadcast channel is considered, where each user is equipped with an arbitrary number of antennas. A closed-loop scenario is assumed, for which a practical capacity-approaching scheme is developed. By applying judiciously chosen unitary operations at the transmit and receive nodes, the channel matrices are triangularized so that the resulting matrices have equal diagonals, up to a possible multiplicative scalar factor. This, along with the utilization of successive interference cancellation, reduces the coding and decoding tasks to those of coding and decoding over the single-antenna additive white Gaussian noise channel. Over the resulting effective channel, any off-the-shelf code may be used. For the two-user case, it was recently shown that such joint unitary triangularization is always possible. In this paper, it is shown that for more than two users, it is necessary to carry out the unitary linear processing jointly over multiple channel uses, i.e., space-time processing is employed. It is further shown that exact triangularization, where all resulting diagonals are equal, is still not always possible, and appropriate conditions for the existence of such are established for certain cases. When exact triangularization is not possible, an asymptotic construction is proposed, that achieves the desired property of equal diagonals up to edge effects that can be made arbitrarily small, at the price of processing a sufficiently large number of channel uses together.