Real-time prediction of crash risk is an effective method for enhancing traffic safety, but it is not fully explored in freeway tunnels. A two-stage deep learning modeling framework comprising a preliminary exploration stage and a prediction and analysis stage is proposed for real-time crash risk prediction in freeway tunnels. A random parameters logit model with heterogeneity in means and variances is used in the preliminary exploration stage to investigate the unobserved heterogeneity and influence mechanism of precursors on real-time crash risk. In the prediction and analysis stage, a random deep and cross network model considering feature interactions and unobserved heterogeneities is developed to predict and analyze real-time crash risk, which is interpreted by the shapley additive explanations approach. The multi-source fusion dataset, collected from the Caltrans performance measurement system and the weather information website, is used to validate the proposed framework for exploring real-time crash risk in freeway tunnels. Results reveal that: (1) the random parameters logit model with heterogeneity in means and variances outperforms the traditional logit model in terms of the model fitting, providing a reference for deep learning modeling that may be able to improve model performance by addressing heterogeneity; (2) the important crash precursors such as the average difference in speed between detectors of tunnel entrance and exit are discovered based on the marginal effect analysis of the random parameters logit model with heterogeneity in means and variances; (3) the random deep and cross network model yields the best prediction performance compared to its counterparts (some other data-driven models), demonstrating the superior performance of deep learning models for real-time risk prediction tasks. It also indicates that considering feature interaction and heterogeneity in deep learning modeling can improve prediction performance; and (4) the important precursors found in the random deep and cross network model using the shapley additive explanations approach are close to those discovered in the statistical model, indicating that the proposed deep learning model can capture the similar effects of precursors as the statistical models, and the precursor interactions and heterogeneities also can be observed by the shapley additive explanations approach.