To adequately define risk in an operational setting, modeling and data uncertainty must be addressed. Though metrics to evaluate model performance are numerous in the literature, few integrate either modeling uncertainty or benchmark data uncertainty, and even fewer integrate both. The Combined Overlap Percentage (COP) ensemble metric is a notable exception: it is based on optimizing the trade-off objectives of maximizing the overlap between simulated and benchmark uncertainty bounds (overlap-reliability) while minimizing simulated ensemble uncertainty bound width (overlap-sharpness) with equal weight. We further develop the COP by assessing weighting methods to increase applicability to additional types of benchmark data uncertainty. As new advanced datasets are generated each year, the weighted COP can integrate ensembles of benchmark data rather than forcing modelers to attempt to identify the best product at a low computational cost. The new weighting method further allows the COP to adapt to the unique features of those new datasets. Results suggest increasing the weight of overlap-sharpness when robust benchmark uncertainty estimates are available. Conversely, higher weights should be given to overlap-reliability when little benchmark uncertainty information is available. Finally, timestep weighting and data transforms are only impactful if overlap-sharpness is prioritized. The results are particularly relevant in an operational context and could allow for the integration of uncertainty into calibration and ensemble generation at a low computational cost.