It is a grand challenge to realize robust rainfall-runoff prediction for a changing climate through conceptual hydrological models. Although multi-model ensemble (MME) is considered useful in improving the robustness of hydrological prediction, it has yet to be thoroughly evaluated. We evaluated the robustness of MME by 44 conceptual hydrological models in 582 river basins. We found that MME was more accurate and robust than each individual model alone. Although the performance of MME degrades in the validation period, the extent of degradation is smaller for MME than for individual models, especially when the climatology of river discharge in the validation period is greatly different from that in the calibration period. This implies the robustness of MME to climate change. It was found to be difficult to quantify the robustness of MME when the number of basins and models is small, which implies the importance of the large number of models and watersheds to evaluate the robustness and uncertainty in hydrological prediction.