This paper presents a high-throughput, area-efficient and energy-efficient iterative detection and decoding (IDD) receiver for low-density parity-check (LDPC)-coded multiple-input multiple-output (MIMO) systems. A layered non-resetting IDD technique is used to minimize the number of inner iterations for a required error performance. An area-efficient minimum mean-square error with parallel interference cancellation (MMSE-PIC) detector is devised to simplify matrix inversion. A detector-decoder interface that is used to exchange soft messages efficiently is proposed. Given the throughput specifications, inner and outer loops are optimally combined to maximize the error performance. The design specifications defined in the IEEE 802.11n standard are adopted as the design target. A 4 $\times$ 4 antenna configuration with BPSK, QPSK, 16-QAM modulations are realized in silicon. The designs that support 64-QAM and 256-QAM modulations are also demonstrated for comparison with prior work. Fabricated in 40 nm technology, the chip integrates 998k logic gates in 1.33 ${\rm mm}^{2}$ and achieves a maximum throughput of 794 Mb/s. The chip dissipates 135 mW at 0.9 V, achieving an energy efficiency of 170 pJ/bit.