The speech emotion recognition has a huge potential in human computer interaction applications in fields such as psychology, psychiatry and affective computing technology. The great majority of research works on speech emotion recognition have been made based on record repositories consisting of short sentences recorded under laboratory conditions. In this work, we researched the use of the Emotional Temperature strategy for continuous tracking in long-term samples of speech in which there are emotional changes during the speech. Emotional Temperature uses a few prosodic and paralinguistic features set obtained from a temporal segmentation of the speech signal. The simplicity and limitation of the set, previously validated under laboratory conditions, make it appropriate to be used under real conditions, where the spontaneous speech is continuous and the emotions are expressed in certain moments of the dialogue, given emotional turns. This strategy is robust, offers low computational cost, ability to detect emotional changes and improves the performance of a segmentation based on linguistic aspects. The German Corpus EMO-DB (Berlin Database of Emotional Speech), the English Corpus LDC (Emotional Prosody Speech and Transcripts database), the Polish Emotional Speech Database and RECOLA (Remote Collaborative and Affective Interactions) database are used to validate the system of continuous tracking from emotional speech. Two experimentation conditions are analyzed, dependence and independence on language and gender, using acted and spontaneous speech respectively. In acted conditions, the approach obtained accuracies of 67–97% while under spontaneous conditions, compared to annotation performed by human judges, accuracies of 41–50% were obtained. In comparison with previous studies in continuous emotion recognition, the approach improves the existing results with an accuracy of 9% higher on average. Therefore, this approach has a good performance with low complexity to develop real-time applications or continuous tracking emotional speech applications.