Human cognitive functions such as perception, attention, learning, memory, reasoning, and problem-solving are all significantly influenced by emotion. Emotion has a particularly potent impact on attention, modifying its selectivity in particular and influencing behavior and action motivation. Artificial Emotional Intelligence (AEI) technologies enable computers to understand a user's emotional state and respond appropriately. These systems enable a realistic dialogue between people and machines. The current generation of adaptive user interference technologies is built on techniques from data analytics and machine learning (ML), namely deep learning (DL) artificial neural networks (ANN) from multimodal data, such as videos of facial expressions, stance, and gesture, voice, and bio-physiological data (such as eye movement, ECG, respiration, EEG, FMRT, EMG, eye tracking). In this study, we reviewed existing literature based on ML and data analytics techniques being used to detect emotions in speech. The efficacy of data analytics and ML techniques in this unique area of multimodal data processing and extracting emotions from speech. This study analyzes how emotional chatbots, facial expressions, images, and social media texts can be effective in detecting emotions. PRISMA methodology is used to review the existing survey. Support Vector Machines (SVM), Naïve Bayes (NB), Random Forests (RF), Recurrent Neural Networks (RNN), Logistic Regression (LR), etc., are commonly used ML techniques for emotion extraction purposes. This study provides a new taxonomy about the application of ML in SER. The result shows that Long-Short Term Memory (LSTM) and Convolutional Neural Networks (CNN) are found to be the most useful methodology for this purpose.