Abstract The High-Resolution Rapid Refresh (HRRR) model provides hourly updating forecasts of convective-scale phenomena, which can be used to infer the potential for convective hazards (e.g., tornadoes, hail, and wind gusts), across the United States. We used deterministic 2019–20 HRRR, version 4 (HRRRv4), forecasts to train neural networks (NNs) to generate 4-hourly probabilistic convective hazard forecasts [neural network probability forecasts (NNPFs)] for HRRRv4 initializations in 2021, using storm reports as ground truth. The NNPFs were compared to the skill of a smoothed updraft helicity (UH) baseline to quantify the benefit of the NNs. NNPF skill varied by initialization time and time of day but was all superior to the UH forecast. NNPFs valid at hours between 1800 and 0000 UTC were most skillful in aggregate, significantly exceeding the baseline forecast skill. Overnight NNPFs (i.e., valid 0600–1200 UTC) were least skillful, indicating a diurnal cycle in hazard predictability that was present across all HRRRv4 initializations. We explored the sensitivity of HRRRv4 NNPF skill to NN training choices. Including an additional year of 2021 HRRRv4 forecasts for training slightly improved skill for 2022 HRRRv4 NNPFs, while reducing the training dataset size by 40% using only forecasts with storm reports was not detrimental to forecast skill. Finally, NNs trained with 2018–20 HRRRv3 forecasts led to a reduction in NNPF skill when applied to 2021 HRRRv4 forecasts. In addition to documenting practical predictability challenges with convective hazard prediction, these findings reinforce the need for a consistent model configuration for optimal results when training NNs and provide best practices when constructing a training dataset with operational convection-allowing model forecasts. Significance Statement Convective hazards, such as hail and tornadoes, are often challenging to predict. To improve hazard predictions, we used machine learning (ML) to generate forecasts of convective hazards across the United States leveraging forecasts of prior events. The ML hazard forecasts were consistently better than a non-ML approach and varied in skill based on the time of day, with nighttime forecasts being particularly challenging. ML forecasts of wind gusts and hail were more skillful than tornadoes. Different strategies of constructing the dataset of prior events led to differences in forecast performance; thus, this work provides recommendations for how to assemble these datasets and train ML models to generate improved forecasts of severe weather events.