Online fatigue estimation is, inevitably, in demand as fatigue can impair the health of college students and lower the quality of higher education. Therefore, it is essential to monitor college students' fatigue to diminish its adverse effects on the health and academic performance of college students. However, former studies on student fatigue monitoring are mainly survey-based with offline analysis, instead of using constant fatigue monitoring. Hence, we proposed an explainable student fatigue estimation model based on joint facial representation. This model includes two modules: a spacial-temporal symptom classification module and a data-experience joint status inferring module. The first module tracks a student's face and generates spatial-temporal features using a deep convolutional neural network (CNN) for the relevant drivers of abnormal symptom classification; the second module infers a student's status with symptom classification results with maximum a posteriori (MAP) under the data-experience joint constraints. The model was trained on the benchmark NTHU Driver Drowsiness Detection (NTHU-DDD) dataset and tested on an Online Student Fatigue Monitoring (OSFM) dataset. Our method outperformed the other methods with an accuracy rate of 94.47% under the same training-testing setting. The results were significant for real-time monitoring of students' fatigue states during online classes and could also provide practical strategies for in-person education.