Smart buildings optimize energy consumption and occupant comfort through Heating, Ventilation and Air Conditioning, and lighting management. Nevertheless, large venues require data fusion techniques to improve analysis and forecasting. This study aims to evaluate the effectiveness of using different feature fusion techniques, environmental sensors, and semi-supervised learning to estimate indoor occupancy in a 230 m2 office. Using five Internet of Things devices measuring air temperature, relative humidity, and barometric pressure, data was collected for 99 days with 6800 entries (on average) and only 14% labeled. Eight feature selection methods were evaluated along with three supervised and two semi-supervised classification methods. Results indicate that the Chi-squared-based approach for feature fusion outperformed others. Similarly, the semi-supervised Self-Training model achieved better performance than the supervised methods. This research shows that combining semi-supervised learning and data fusion allows for estimating the occupancy level in large indoor spaces with high accuracy and low labeling costs. Highlights This study pioneers in exploring semi-supervised learning and distinct feature fusion methods for estimating indoor occupancy levels in a 230 m 2 open office using only Internet of Things (IoT) environmental sensors (air temperature, relative humidity, and barometric pressure). A comprehensive comparison of statistical methods, feature selection, and dimensionality reduction techniques are conducted to determine their ability to generate robust feature fusion sets. The feature fusion selected through the Chi-squared test stood out with a high accuracy F1-score (average of 0.95) and an average accuracy of 0.99. The Self-Training model reached the best performance from semi-supervised learning, with an average F1-Score of 0.90 and an average accuracy of 0.97, based on a dataset with a large proportion of unlabelled data (16,847 entries) and only 9367 labels. For supervised learning, Random Forest achieved a high accuracy (average of 0.98) and F1-score (average of 0.93) across various feature sets.