Real-time conflict prediction at signalized intersections is crucial for urban road safety management. This study developed a real-time conflict prediction framework for signalized intersections using video data real-time recognition technology and deep learning techniques, incorporating lane-level information and feature interactions. The modeling framework consists of three stages: real-time video data extraction and processing, the development of a Deep and Cross Network (DCN)-based real-time traffic conflict prediction model, and conflict-driven factor interpretability analysis through SHapley Additive exPlanations (SHAP). In the first stage, an efficient automated trajectory extraction system is designed to obtain vehicle trajectories in real time for dynamic traffic parameters and conflict frequency identification. In the second stage, a DCN model is developed to construct the relationships between dynamic traffic parameters, including their interactions, and traffic conflicts. In the third stage, SHAP explores the impact mechanisms of different dynamic traffic parameters on traffic conflicts. The model’s predictive performance and interpretability are evaluated using intersection video data from Changsha City, China. The results show that: 1) In real-time traffic conflict prediction at signalized intersections across different modified time-to-conflict thresholds (1.5s and 3.0s), the DCN model consistently outperformed statistical and machine learning models. 2) High traffic flow on main and secondary roads at signalized intersections significantly increases the complexity and frequency of conflicts, with varying sensitivity depending on the interaction of traffic flow, speed, and platoon length. 3) The proposed framework provides a safety measurement standard for data-driven road safety management methods.