Process-based phenological models use thermal requirement (TR) defined by planting date, temperature and photoperiod to predict crop developmental stages. The TR value for a specific developmental stage for a given variety is often presumed to be constant regardless of environmental conditions. We calibrated and compared 12 phenological models using 27-year of observation data of one unique rice (Oryza sativa L.) variety ('Shanyou63') from 46 sites in southern China. Our findings indicated that shifts in environmental conditions significantly affected TR values, e.g., standard deviations of TR at physiological maturation ranged from 83 to 167 °C d. Clustering sites together minimized environmental heterogeneity, and thus minimized the differences in TR for different phenological stages. When the increased from 1 to 24, simulation errors for the 12 models showed a significant decrease across all developmental stages, from 1.8 to 1.4 days for tillering, from 5.3 to 3.7 days for jointing, from 5.6 to 3.9 days for booting, from 4.7 to 3.3 days for heading, and from 6.6 to 3.7 days for physiological maturation. Furthermore, our findings indicate that models featuring a three-segment piecewise linear temperature response function provide a more precise prediction. In contrast, models incorporating a Beta temperature response function have not performed well. This difference is attributed to different mechanisms used to describe the response of crop development rate to temperature, particularly at non-optimum temperatures. The impact of the photoperiod response function on prediction accuracy became significant with the expansion of scale. Our results demonstrate that clustering method effectively compensates for the lack of crop adaptation processes in common phenological models, leading to significantly improved phenological prediction accuracy in regions with environmental diversity.