Benefiting from tens of GHz of bandwidth, terahertz (THz) communication has become a promising technology for future 6G network. To deal with the serious propagation loss of THz signals, massive multiple-input multiple-output (MIMO) with hybrid precoding is utilized to generate directional beams with high array gains. However, the standard hybrid precoding architecture based on frequency-independent phase-shifters cannot cope with the beam split effect in THz massive MIMO caused by the large bandwidth and the large number of antennas, where the beams split into different physical directions at different frequencies. The beam split effect will result in a serious array gain loss across the entire bandwidth, which has not been well investigated in THz massive MIMO. In this paper, we first quantify the seriousness of the beam split effect in THz massive MIMO by analyzing the array gain loss it causes. Then, we propose a new precoding architecture called delay-phase precoding (DPP) to mitigate this effect. Specifically, the proposed DPP introduces a time delay network composed of a small number of time delay elements between radio-frequency chains and phase-shifters in the standard hybrid precoding architecture. Unlike <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">frequency-independent</i> phase shifts, the time delay network introduced in the DPP can realize <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">frequency-dependent</i> phase shifts, which can be designed to generate frequency-dependent beams towards the target physical direction across the entire bandwidth. Due to the joint control of delay and phase, the proposed DPP can alleviate the array gain loss caused by the beam split effect. Furthermore, we propose a hardware structure by using true-time-delayers to realize frequency-dependent phase shifts for realizing the concept of DPP. A corresponding precoding algorithm is proposed to realize the precoding design. Theoretical analysis and simulations show that the proposed DPP can mitigate the beam split effect and achieve near-optimal rate with higher energy efficiency.