Proton density fat fraction (PDFF) is a validated biomarker of tissue fat quantification. However, validation has been limited to single-center or multi-center series using non-FDA-approved software. Thus, we assess the bias, linearity, and long-term reproducibility of PDFF obtained using commercial PDFF packages from several vendors. Over 35 months, 438 subjects and 16 volunteers from a multi-center observational trial underwent PDFF MRI measurements using a 3-T MR system from one of three different vendors or a 1.5-T system from one vendor. Fat-water phantom sets were measured as part of each subject's examination. Manual region-of-interest measurements on the %fat image, then cross-sectional bias, linearity, and long-term reproducibility were assessed. Three hundred ninety-two phantom measurements were evaluable (90%). Bias ranged from 2.4 to - 3.8% for the lowest to the highest weight %fat. Regression fits of PDFF against synthesis weight %fat showed negligible non-linear effects and a linear slope of 0.94 (95% confidence interval: 0.938, 0.947). We observed significant vendor (p < 0.001) and field strength (p < 0.001) differences in bias and longitudinal variability. When the results were pooled across sites, vendors, and field strengths, the estimated reproducibility coefficient was 6.93% (95% CI: 6.25%, 7.81%). This study demonstrated good linearity, accuracy, and reproducibility for all investigated manufacturers and field strengths. However, significant vendor-dependent and field strength-dependent bias were found. While longitudinal PDFF measurements may be made using different field strength or vendor MR systems, if the MR system is not the same, based on these results, only PDFF changes ≥ 7% can be considered a true difference. • Phantom fat fraction (PDFF) MRI measurements over 35 months demonstrated good linearity, accuracy, and reproducibility for the vendor systems investigated. • Non-linear effects were negligible (linear slope of 0.94) over 0-100% fat; however, significant vendor (p < 0.001) and field strength (p<0.001) differences in bias and longitudinal variability were identified. Bias ranged from 2.4 to - 3.8% for 0-100 weight% fat, respectively. • Measurement bias could affect the accuracy of PDFF in clinical use. As the reproducibility coefficient was 6.93%, only greater changes in % fat can be considered true differences when making longitudinal PDFF measurements on different MR systems.