This paper provides a step by step introduction to real-time speech emotion recognition (SER) using a pre-trained image classification network. The procedure has low computational requirements and can be implemented on different voice-based communication platforms such as mobile phones, call centers, and online communication facilities. Effects of reduced speech bandwidth and the mu-low companding procedure used in transmission systems on the SER accuracy are examined. The results showed that the baseline approach achieved an average accuracy of 82% when trained on the Berlin Emotional Speech (EMO-DB) data with seven categorical emotions. Reduction of the sampling frequency from the baseline 16 kHz to 8 kHz (i.e., bandwidth reduction from 8 kHz to 4 kHz respectively) led to a decrease of SER accuracy by about 3.3%. The companding procedure reduced the result by 3.8%, and the combined effect of both factors further reduced the average accuracy by about 7% compared to the baseline results. The SER was implemented in real-time with emotional labels generated every 1.033 to 1.026 seconds. Real-time implementation timelines are presented.