Abstract During the summers of 2016 and 2017, the Center for Analysis and Prediction of Storms (CAPS) ran real-time storm-scale ensemble forecasts (SSEFs) in support of the Hydrometeorology Testbed (HMT) Flash Flood and Intense Rainfall (FFaIR) experiment. These forecasts, using WRF-ARW and Nonhydrostatic Mesoscale Model on the B-grid (NMMB) in 2016, and WRF-ARW and GFDL Finite Volume Cubed-Sphere Dynamical Core (FV3) in 2017, covered the contiguous United States at 3-km horizontal grid spacing, and supported the generation and evaluation of precipitation forecast products, including ensemble probabilistic products. Forecasts of 3-h precipitation accumulation are evaluated. Overall, the SSEF produces skillful 3-h accumulated precipitation forecasts, with ARW members generally outperforming NMMB members and the single FV3 member run in 2017 outperforming ARW members; these differences are significant at some forecast hours. Statistically significant differences exist in the performance, in terms of bias and ETS, among subensembles of members sharing common microphysics and PBL schemes. Year-to-year consistency is higher for PBL subensembles than for microphysical subensembles. Probability-matched (PM) ensemble mean forecasts outperform individual members, while the simple ensemble mean exhibits substantial bias. A newly developed localized probability-matched (LPM) ensemble mean product was produced in 2017; compared to the simple ensemble mean and the conventional PM mean, the LPM mean exhibits improved retention of small-scale structures, evident in both 2D forecast fields and variance spectra. Probabilistic forecasts of precipitation exceeding flash flood guidance (FFG) or thresholds associated with recurrence intervals (RI) ranging from 10 to 100 years show utility in predicting regions of flooding threat, but generally overpredict the occurrence of such events; however, they may still be useful in subjective flash flood risk assessment.