Postpartum hemorrhage is a leading source of maternal morbidity and mortality worldwide with uterine atony identified as the underlying cause in up to 80% of cases. Several measures have been utilized to report uterine tone. The most commonly reported measure is a 0 to 10 numeric rating scale, but this scale has not been tested for reliability or agreement between different raters. The primary purpose of this study was to evaluate the interrater reliability and agreement of the 0 to 10 visual numeric rating scale of uterine tone during cesarean delivery. A secondary purpose was to obtain estimates of scale responsiveness and minimal clinically important difference. Between August and November of 2018, obstetricians used a 0 to 10 numeric rating score to independently rate uterine tone at 3 and 10 minutes after cesarean delivery by palpation of the uterus. Of note, "0" represented "no tone" and "10" represented excellent tone. Each obstetrician independently and blinded to the other's score pointed to a numeric rating scale held by the anesthesiologist through a clear sterile drape. Intraclass correlation coefficients and Bland-Altman analysis were used to assess interrater reliability and agreement, respectively. Standardized response mean and standard error of measurement were used to obtain estimates of responsiveness and minimal clinically important difference, respectively. A total of 82 and 84 pairs of scores were collected at 3 and 10 minutes, respectively, from pairs of 62 unique obstetricians. The mean±standard deviation difference in scores between rater 1 and rater 2 was 0.4±1.4 at 3 minutes and 0.1±1.1 at 10 minutes. Intraclass correlation coefficients for a future single rater (intraclass correlation coefficient [1, 1]) at 3 and 10 minutes were 0.67 (95% confidence interval, 0.53-0.77) and 0.61 (95% confidence interval, 0.46-0.73), and for the average between 2 future raters (intraclass correlation coefficient [1, 2]), they were 0.80 (95% confidence interval, 0.71-0.87) and 0.76 (95% confidence interval, 0.63-0.84), indicating good and excellent reliability, respectively. Bland-Altman analysis estimated 95% limit of agreement between raters of -2.4 (95% confidence interval, -3.0 to -1.9) to 3.1 (95% confidence interval, 2.6-3.7) at 3 minutes and -2.1 (95% confidence interval, -2.5 to -1.7) to 2.4 (95% confidence interval, 2.0-2.8) at 10 minutes, consistent with good interrater agreement at both time points. The standardized response mean from 3 to 10 minutes after delivery was 1.1 (n=81). Standard error of measurement was 1.0 (95% confidence interval, 0.9-1.1) at 3 minutes and 0.8 (95% confidence interval, 0.7-0.9) at 10 minutes. The 0 to 10 numeric rating scale for uterine tone demonstrated good to excellent interrater reliability with 1 and 2 raters, respectively, and good interrater agreement. The scale was responsive to within-parturient change in tone, and preliminary estimates of the minimal clinically important difference were obtained. The 0 to 10 numeric rating scale for uterine tone may be a reliable, standardized tool for future research in reporting degree of uterotonic contraction during cesarean delivery.