Traditional broadcasting methods often result in fatigue and decision-making errors when dealing with complex and diverse live content. Current research on intelligent broadcasting primarily relies on preset rules and model-based decisions, which have limited capabilities for understanding emotional dynamics. To address these issues, this study proposed and developed an emotion-driven intelligent broadcasting system, EmotionCast, to enhance the efficiency of camera switching during live broadcasts through decisions based on multimodal emotion recognition technology. Initially, the system employs sensing technologies to collect real-time video and audio data from multiple cameras, utilizing deep learning algorithms to analyze facial expressions and vocal tone cues for emotion detection. Subsequently, the visual, audio, and textual analyses were integrated to generate an emotional score for each camera. Finally, the score for each camera shot at the current time point was calculated by combining the current emotion score with the optimal scores from the preceding time window. This approach ensured optimal camera switching, thereby enabling swift responses to emotional changes. EmotionCast can be applied in various sensing environments such as sports events, concerts, and large-scale performances. The experimental results demonstrate that EmotionCast excels in switching accuracy, emotional resonance, and audience satisfaction, significantly enhancing emotional engagement compared to traditional broadcasting methods.