Facial expression recognition (FER) is one of the trending topics in the research field in recent years. Facial expressions are the direct way in which humans communicate their emotions and identifying these emotions requires complex cognitive skills. Because of continual advances in image processing and machine learning techniques, several novel strategies are presented each year. Their performance still needs to be enhanced, and it is a challenging task. So, we aim to propose a novel method called Ensemble Convolutional Recurrent Neural Network (CRNN), and this model is proposed to enhance the prediction accuracy of facial expressions. The methods such as GRU, LSTM, and CNN prediction outputs are incorporated. An Adaptive Neuro-Fuzzy Inference System (ANFIS) is constructed as the ensemble model's topmost layer to adequately examine the few sub-models. The final output is predicted by the fuzzy inference system. The facial Recognition Dataset and EMOTIC dataset are utilized in the method for evaluating the results. The various performance metrics are utilized to test the efficiency of the proposed model such as precision, False Positive Rate, F1-score, True Positive Rate, and accuracy. Our proposed method achieves 99.52% for precision and 99.35% for F1-score. Compared with other existing such our proposed method achieves 99.75% and the value of the Area under the ROC curve (AUC) is 0.95.