It is reported that about US$28B/year is spent on pre-clinical studies that are not reproducible. FLASH studies may suffer from the same reproducibility crisis due to the non-standard nature of the FLASH beamlines and the lack of dosimeters that can function at ultra-high dose-rates. There have been reports of different outcomes with regard to the FLASH effect across different institutions, even though similar beamlines, temporal structure, and nominal dose levels were used. This brings up the question of the accuracy of dosimetry under FLASH conditions for a fair comparison between FLASH and CONV. To answer this question, we develop and characterize an anatomically realistic 3D-printed mouse phantom to be used in a multi-institutional dosimetric benchmarking effort. Mesh files for bony anatomy, lungs, and soft tissue derived from a CT scan of a mouse were converted to an editable 3D model. The 3D model was cut along the coronal plane and modified to allow the inclusion of radiographic film. A multi-material approach was employed to print the phantom. A dual-nozzle 3D printer was used, where one of the nozzles used Acrylonitrile butadiene styrene (ABS) to mimic soft tissue and the other nozzle used Polyactic acid (PLA) to mimic bone density. The two materials were used together in a single print. Lungs were approximated by lightweight PLA and were printed separately and inserted into corresponding cavities in the phantom. Hounsfield Units (HU) and print-to-print stability were verified. Radiographic films were laser cut for different anatomical sites. Two institutes took part in this study with data pending from 3 more institutions. The institutes were instructed to deliver 10 Gy to the plane of the film for the whole abdomen, whole lung, and brain irradiations. 2D dose maps were compared between FLASH and CONV, and the deviation from the prescribed dose was also measured. The 3D-printed soft tissue, bone, and lung densities were measured to be ∼ 1.01 g/cc, 1.22 g/cc, and 0.44 g/cc, respectively. For soft tissue and bone, the Hounsfield unit (HU) difference from one print to another was < 10 HU. The greatest variation was within the lungs (54 HU), but this had a minimal effect on the dose distribution (<1%). For the two institutions that completed the survey, the maximum average difference between FLASH and CONV for all irradiations was 0.75 Gy (7.48%). The maximum average difference from the prescribed dose for all irradiations was 0.7 Gy (7.20%) across both institutions. The largest discrepancy was generally observed to be for lung irradiation, indicating that lack of treatment planning systems limits our ability to prescribe accurately in areas of inhomogeneities. A 3D printed anatomically realistic mouse phantom was developed, characterized, and used in a multi-institutional dosimetric benchmarking effort. Such a study is paramount for the clinical translation of FLASH as it facilitates reduced variability from one institution to another.