To compare the dosimetric impact of all major commercial vendors' metal artifact reduction (MAR) algorithms to one another, as well as to a novel in-house technique (AMPP) using an anthropomorphic head phantom. The phantom was an Alderson phantom, modified to allow for artifact-filled and baseline (no artifacts) computed tomography (CT) scans using teeth capsules made with metal amalgams or bone-equivalent materials. It also included a cylindrical insert that was accessible from the bottom of the neck and designed to introduce soft tissue features into the phantom that were used in the analysis. The phantom was scanned with the metal teeth in place using each respective vendor's MAR algorithm: OMAR (Philips), iMAR (Siemens), SEMAR (Canon), and SmartMAR (GE); the AMPP algorithm was designed in-house. Uncorrected and baseline (bone-equivalent teeth) image sets were also acquired using a Siemens scanner. Proton spot scanning treatment plans were designed on the baseline image set for five targets in the phantom. Once optimized, the proton beams were copied onto the different artifact-corrected image sets, with no reoptimization of the beams' parameters, to evaluate dose distribution differences in the different MAR-corrected and -uncorrected image sets. Dose distribution differences were evaluated by comparing dose-volume histogram (DVH) metrics, including planning target volume D95 and clinical target volume D99 coverages, V100, D0.03cc, and heterogeneity indexes, along with a qualitative and water equivalent thickness (WET) analysis. Uncorrected CT metal artifacts and commercial MAR algorithms negatively impacted the proton dose distributions of all five target shapes and locations in an inconsistent manner, sometimes overdosing by as much as 11.1% (D0.03) or underdosing by as much as 11.7% (V100) the planning target volumes. The AMPP-corrected images, however, provided dose distributions that consistently agreed with the baseline dose distribution. The dosimetry results also suggest that the commercial MAR algorithms' performances varied more with target location and less with target shape. Once relocated further from the metal, the target showed dose distributions that agreed more with the baseline for all commercial solutions, improving the overdosing by as much as 6%, implying inadequate HU correction from commercial MAR algorithms. In comparison to the baseline, HU profile shapes were considerably altered by commercial algorithms and reference values showed differences that represent stopping power percentage differences of 2.7-10%. The AMPP algorithm plans showed the smallest WET differences with the baseline (0.06cm on average), while the commercial image sets created differences that ranged from 0.11 to 0.54cm. Computed tomography metal artifacts negatively impacted proton dose distributions on all five targets analyzed. The commercial MAR solutions performed inconsistently throughout all targets compared to the metal-free baseline. A lack of CTV coverage and an increased number of hotspots were observed throughout all commercial solutions. Dose distribution errors were related to the proximity to the artifacts, demonstrating the inability of commercial techniques to adequately correct severe artifacts. In contrast, AMPP consistently showed dose distributions that best matched the baseline, likely because it makes use of accurate HU information, as opposed to interpolated data like commercial algorithms.