In the process of communication between people, everyone will have emotions, and different emotions will have different effects on communication. With the help of external performance information accompanied by emotional expression, such as emotional speech signals or facial expressions, people can easily communicate with each other and understand each other. Emotion recognition is an important network of affective computers and research centers for signal processing, pattern detection, artificial intelligence, and human-computer interaction. Emotions convey important information in human communication and communication. Since the end of the last century, people have started the research on emotion recognition, especially how to correctly judge the emotion type has invested a lot of time and energy. In this paper, multi-modal emotion recognition is introduced to recognize facial expressions and speech, and conduct research on adaptive higher education management. Language and expression are the most direct ways for people to express their emotions. After obtaining the framework of the dual-modal emotion recognition system, the BOW model is used to identify the characteristic movement of local areas or key points. The recognition rates of emotion recognition for 1,000 audios of anger, disgust, fear, happiness, sadness and surprise are: 97.3, 83.75, 64.87, 89.87, 84.12, and 86.68%, respectively.