Accurate determination of the representative elementary volume (REV) size plays a pivotal role in analysing the mechanical properties and failure processes of heterogeneous rocks in complex engineering environments. In this study, a novel microstructure modelling strategy (NMMS) for determining the REV size is proposed by combining deep learning and an improved phase-field method (PFM). Micro- and macroscale experiments are systematically conducted to determine the real microstructural characteristics and mechanical properties of heterogeneous rocks with different grain sizes. On the basis of this experimental evidence, geometric models of different sizes were reconstructed through deep learning to avoid the limitations of human-based methods, and an improved PFM was used for numerical calculations. These models were then employed to perform numerical tests under uniaxial loading conditions, and the coefficient of variation was introduced to determine the REV size of heterogeneous rocks with different grain sizes. The research findings indicate that the final REV size is the maximum value of the REVs defined by the evaluation properties within an acceptable coefficient of variation. At a criterion of 5% for the coefficient of variation, the REV sizes are 60 mm×60 mm, 70 mm×70 mm, and 90 mm×90 mm for fine-medium-grained (FMG), medium-grained (MG), and coarse-grained (CG) rocks, respectively. Furthermore, the REV determined by the NMMS was applied to investigate the effects of microstructure on macromechanical properties and damage evolution under triaxial loading conditions. The numerical results show that the NMMS can accurately predict the macromechanical properties and microcracking patterns of heterogeneous rocks, especially the intracrystalline cracks in feldspar, the interfacial cracks in gravel, and the “voids” of cracks in biotite. This research can provide some basic references for the optimal choice of the REV size of heterogeneous rocks.