Facial expression is one of the important forms of non-verbal communication and associated with human perceptions and behaviors. This study aims at developing personal thermal comfort models using facial video and examining the association between facial muscle movement and the individual thermal comfort. We collected facial videos of occupants for weeks in residential buildings in the U.S. Then we extracted lower-dimensional facial behavior features from the videos, engineered the features with different computational approaches, and trained machine-learning algorithms to understand their heterogenous thermal perception. The experimental results show that the median prediction performance could be up to the accuracy of 77.78% and the Area under the ROC Curve (AUC) of 0.8899. We found that the action units (AUs) related to negative emotions such as anger or fear have a great influence on detecting thermal perception. In addition, the test accuracy was over 74% even when the predictors used were the averages of facial behavior feature sequences which is the simplest engineering approach in our study, but we often observed that leveraging wave-like patterns or principal components exceeds that performance. This study provides an insight into better modeling personal thermal comfort using cost-effective and less-intrusive data in real-life settings.