Facial expressions are one of the most essential channels to communicate a person’s emotional state. In social interaction, the capability to accurately read subtle changes in facial expressions, which reveal emotional fluctuations, is critical for 1) comprehending others’ emotions in context and background situations, 2) identifying responsiveness to others’ emotions, and 3) developing social skills in human–computer interaction. In this paper, we first introduce automatic emotion change detection via facial expression that discovers timings or temporal locations in a video where facial expression significantly changes. We propose a weakly-supervised deep emotion change detection framework that does not require facial expression videos with expensive temporal annotations and instead learns static images for training. Incorporating these ideas, we performed extensive experiments to demonstrate fundamental insights into emotion change detection and the efficacy of our framework using three video datasets, i.e., CASME II, MMI, and our YoutubeECD. Furthermore, we modified our framework for temporal spotting, which is the most similar task to emotion change detection, and showed comparable results with state-of-the-art methods on CAS(ME)2, proving justification for the problem. Even though we only employed the AffectNet to train our framework rather than the CASME II, MMI, YoutubeECD, and CAS(ME)2, experimental results demonstrate its exceptional generalization capability in cross-dataset environments.