Thermal comfort models that account for individual thermal sensations are crucial for optimizing environmental control while maintaining occupant comfort. However, the current approach of developing occupant-specific models is not scalable for new occupants whose data were not used for training, making it impractical for multi-occupant spaces and burdensome for management. To address this, we propose a thermal comfort model based on a subject-independent evaluation approach, capable of generalizing to new individuals not included during model training. This model accurately predicts thermal sensations without the need for occupant-specific models, allowing scalability and adaptability to new occupants. Furthermore, this study considers occupants with face masks, which is essential in environments where masks are required for their wellbeing and performance. The model was developed using Compact Convolutional Transformers which combines convolutional layers and transformer layers, allowing the model to capture both fine-grained local features and broader contextual relationships, making it effective for predicting thermal comfort from thermal images. The model was trained using the Leave-One-Subject-Out (LOSO) and Leave-One-Group-Out (LOGO) training approaches. The LOSO approach achieved high accuracies of 98.04 %, 98.93 %, and 98.88 % for dataset A (participants without face masks), dataset B (participants with face masks), and dataset C (a combination of both), respectively. The LOGO approach achieved accuracies of 99.92 %, 99.40 %, and 99.85 % for datasets A, B, and C, respectively, showing a slightly better prediction performance. The approach presented in this study offers a promising solution for designing accurate generalized models for real-time environmental control to meet the thermal comfort needs of occupants.