Determining the emotion of a speaker using their speech utterance by a machine is referred to as Speech Emotion Recognition System. It can greatly enhance the human and machine interaction experience. However, the system faces poor performance due to factors like variations in emotion intensity, speaker, language, and culture. In this study, we have used RAVDESS dataset consisting of speech utterances with two emotional intensities, namely, strong, and normal, raising the recognition difficulty level for the development of an efficient framework for the Speech Emotion Recognition System (SER). For SER, Gender Dependent Training for building the emotion detection models in the Speech Emotion Recognition System is proposed in this research work. The proposed system is less complex and is able to demonstrate good performance using only MFCC features and its variants, namely, delta MFCC and delta-delta MFCC features when compared with the baseline system, which has utilized five different features, namely, MFCC, Mel, Chromogram, Spectral Contrast and Tonnetz. When the average of the speech emotion accuracy of 6 emotions, such as, Sad, Fearful, Calm, Surprised, Disgust and Happy are considered, the proposed system has shown a relatively improved result by 6.90 % over the considered baseline system.