ContextChina produces more than 20 % of maize grain in the world, and Northeast China (NEC) accounts for ∼30 % of the nation's total maize production. Previous studies have used either climate data, satellite data, or crop growth model (CGM) to predict or forecast maize yield. However, maize is highly susceptible to the effect of extreme climate events (such as drought, heat) in NEC, and there is a lack of studies to predict/forecast maize yield by integrating climate data, satellite data, extreme climate events, and CGM-simulated data. ObjectiveWe aim to develop a hybrid approach with machine learning to blend different sources of data (climate data, satellite data, extreme climate events) and process-based modelling results to improve predictive accuracy of maize yield in NEC. MethodsUsing maize data from 44 sites during the period of 2000–2013 in NEC, we firstly optimized Agricultural Production System sIMulator (APSIM) using Differential Evolution Adaptive Metropolis combined with Gaussian likelihood function and Bayesian multiplication method. Next, we divided the growing season into five phases, and selected variables of different phases using exploratory data analysis and Random Forest. Then, we developed a hybrid model using Random Forest by blending of multiple sources of data and APSIM simulations to predict maize yield from the start to the end of the growing season, and quantified the relative contribution of predictors. Results and conclusionsA hybrid model developed with random forest by combining climate data, NDVI, extreme climate events and APSIM simulations can achieve high performance for predicting yield toward the end of the growing season. The accuracy of in-season yield prediction showed a linear increase with MAPE/KGE changing from 19 %/0.05 to 13 %/0.53 from start to end of the growing season. Yield forecasts are acceptable with RMSE/MAE of 1.20/1.01 Mg ha−1 (16 %/13 % of the observed mean yield) approximately one-month prior to harvest. The most important predictor that affect yield forecast was APSIM-simulated biomass or yield, and the most important extreme climate event was drought during early grain-filling stage. SignificanceWith the increasing availability of crop-related data, we expect that the in-season forecasting capacity of the proposed methodology could be further improved, and the methodology can be extended to other crops and other regions for yield forecast.