In the advent of e-learning, understanding student engagement and reaction is crucial for improving the quality of education and enhancing the learning rate. With the advancement of computer vision technologies, there is a significant opportunity to analyze and interpret student reactions in a non-intrusive manner. This study proposes a novel framework employing Faster R-CNN integrated with DenseNet architecture for real-time detection of student facial reactions during e-learning sessions. The proposed method leverages the strengths of Faster R-CNN in generating high-quality region proposals for object detection tasks, coupled with the DenseNet’s efficiency in feature propagation and reduction in the number of parameters, which is well-suited for processing the intricate patterns in facial expressions. Our approach begins with the application of Faster R-CNN to extract potential facial regions with high accuracy and reduced computational cost. The integration of DenseNet as a backbone for feature extraction within Faster R-CNN capitalizes on its densely connected convolutional networks, ensuring maximum information flow between layers in the network. By doing so, the system becomes exceptionally adept at recognizing subtle changes in facial features that indicate various student reactions, such as confusion, engagement, or boredom. We conducted a series of experiments using a diverse dataset of e-learning interactions, collected under various lighting conditions and involving multiple ethnicities to ensure robustness and generalizability. The model was trained and validated on this dataset, and the results demonstrate a significant improvement in detection rates of student reactions compared to existing methods.