Reliable occupancy estimation is the key to balancing energy use and comfort as well as promoting energy efficiency in buildings. In this study, a novel occupancy estimation model based on CO2 concentration data was proposed. A wavelet denoising method was applied to remove the random noise of the CO2 sensory data, and then a multi-grained scanning cascade forests (GcForest) method was used to estimate the number of occupants. The GcForest model incorporated three different tree-based classifiers in each level, enabling the estimation performance to be enhanced by exploiting the complementarity among the different learning algorithms. To evaluate the effectiveness of the proposed model, in this study a validation experiment was conducted in a university lab office and its results were compared with the support vector machines (SVM), classification and regression trees (CART), and inhomogeneous hidden Markov (IHMM) algorithms. The experimental results show that the wavelet denoising method could filter the noise and preserve the data features of the CO2 concentration. Moreover, the proposed model could achieve higher estimation accuracy, lower mean absolute error, and higher detection accuracy of the occupant presence/absence. Additionally, this model could capture both the first arrival time and the last departure time. Since the maximum depths of the classifiers affected the GcForest model's performance, the results also show that a proper selection of the maximum depth combination could lead to a significant improvement of the model estimation accuracy.