The exponential increase in waste generation is a significant global challenge with serious implications. Addressing this issue necessitates the enhancement of waste management processes. This study introduces a method that improves waste separation by integrating learning models at various levels. The method begins with the creation of image features as a new matrix using the Multi-Scale Local Binary Pattern (MLBP) technique. This technique optimally represents features and patterns across different scales. Following this, an ensemble model at the first level merges two Convolutional Neural Network (CNN) models, with each model performing the detection operation independently. A second-level CNN model is then employed to obtain the final output. This model uses the information from the first-level models and combines these features to perform a more accurate detection. The study’s novelty lies in the use of a second-level CNN model in the proposed ensemble system for fusing the results obtained from the first level, replacing conventional methods such as voting and averaging. Additionally, the study employs an MLBP feature selection approach for a more accurate description of the HW image features. It uses the Simulated Annealing (SA) algorithm for fine-tuning the hyperparameters of the CNN models, thereby optimizing the system’s performance. Based on the accuracy metric, the proposed method achieved an accuracy of 99.01% on the TrashNet dataset and 99.41% on the HGCD dataset. These results indicate a minimum improvement of 0.48% and 0.36%, respectively, compared to the other methods evaluated in this study.