Dose-volume histograms (DVH), along with dose and volume metrics, are central to radiotherapy planning. As such, errors have the potential to significantly impact the selection of appropriate treatment plans. Dose distributions that pass tests in one TPS may fail the same tests when transferred to another, even if using identical structures and dose grid information. This work shows the design and implementation of methods for assessing the accuracy of dose and volume computations performed by treatment planning systems (TPS), and other analytical tools. We demonstrate examples where differences in calculations between systems can change the assessment of a plan’s clinical acceptability. Our work also provides a more detailed DVH analysis of single targets than earlier published studies. This is relevant for SRS plans and small structure dose assessments. Very small structures are a particular problem because of their coarse digital representation, and the impact of this is thoroughly examined.Reference DVH curves were derived mathematically, based on Gaussian dose distributions centered on spherical structures. The structures and dose distributions were generated synthetically, and imported into RayStation, MasterPlan, and ProKnow. Corresponding DVHs were analytically derived and taken as ground truth references, for comparison with the commercial DVH calculations. Two commonly used dose metrics PCI and MGI were used to determine the limit of calculation accuracy for small structures. In addition, to measure the DVH differences between a larger range of commercial DVH calculators, the D95 metric from a set of real clinical plans was compared across both the 3 DVH calculators under test, and across a further six TPSs from other hospitals.We show that even slight deviations between the results of DVH calculators can lead to plan check failures, and we illustrate this with the commonly used D95 planning metric. We present clinical data across eight planning systems that highlight instances where plan checks would pass in one software and fail in another due to DVH calculation differences. For the smallest volumes tested, errors of up to 20% were observed in the DVHs. RayStation was tested down to a 3 mm radius sphere (≈0.1 cc) and this showed close to 10% error, reducing to 1% for 10 mm radius (≈4.0 cc) and 0.1% for 20 mm radius (≈33 cc). In clinical plans, the variation in D95 was up to 9% for the smallest volumes, and typically around 2% in the range 0.5 cc-20 cc, and 1% in 20 cc-70 cc, falling to <0.1% for large volumes. Paddick Conformity Index (PCI) and Modified Gradient Index (MGI) are commonly used plan quality indicators for very small volumes. For volumes ≈0.1 cc we observed errors of up to 40% in PCI, and up to 75% in MGI.Our study extends the range of tested DVH calculators in published work, and shows their performance over a wider range of volume sizes. We provide quantitative evidence of the critical need to test the accuracy of DVH calculators in the TPS before clinical use. This work is particularly relevant for both stereotactic plan evaluation and for assessment of small volume doses in published dose constraint recommendations. We demonstrate that significant errors can occur in DVHs for volumes less than 1 cc, even if the volumes themselves are calculated accurately. Even for large structures, deviations between the outputs of DVH calculators can lead to indicated or reported plan check failures if they do not include appropriate tolerances. We urge caution in the use of DVH metrics for these very small volumes and recommend that appropriate DVH uncertainty tolerances are set in organ dose constraints when using them to evaluate clinical plans.