AbstractThe geological condition of the Himalayan region is very complex and challenging. So far, empirical and analytical approaches for rock mass characterization have been a common practice in the Himalayas. Due to the limitations of input parameters and governing equations in design practices, rock mass characterization in tunnel boring machine (TBM) excavated tunnels is crucial. This research introduces robust machine learning (ML) approaches to predict rock mass quality conditions in complex geological environments, leveraging a large database of TBM parameters and rock mass rating (RMR) values. To do so, a total of 6879 stable phase TBM cycle data were collected from 12 km long tunnel in Nepal. The pre-processed parameters were randomly split into a training set (80%) and a testing set (20%). Seven individual classifiers consisting of logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), k-nearest neighbor (KNN), extreme gradient boosting (XGBoost), and bagging, and stacking ensemble classifier were exploited with optimal hyperparameters. The comprehensive assessment carried out has shown that the ensemble classifier gave highest overall accuracy as compared to other individual classifiers. More importantly, the synthetic minority over-sampling technique (SMOTE) performs better to handle the imbalanced database, while the RF and stacking classifier demonstrated the best prediction performance with accuracy of 92%. Moreover, for the minority rock mass class, the RF shows better performance compared to stacking classifier. The authors emphasize that the effective application of ML-based data-driven approach shows substantial potential for rock mass characterization in TBM tunnelling.