Abstract Background Serum free thyroxine (FT4) measurement is one of the most important clinical laboratory tests for diagnosis and classification of thyroid disorders. Currently serum FT4 is commonly measured with Immunoassays (IAs) in patient care. However, there are concerns about the accuracy of FT4 IAs. A reference measurement procedure (RMP) for FT4 based on well-defined equilibrium dialysis (ED) of serum has been established for standardization of routine FT4 IAs. As FT4 RMP is not intended for routine clinical application, in the current study, we aim to develop a high-throughput routine FT4 method based on ED-isotope dilution liquid chromatography tandem mass spectrometry (ED-ID-LC/MS/MS). FT4 RMP was used as a reference to evaluate the measurement accuracy of our ED-ID-LC/MS/MS method and an IA method, respectively. Methods FT4 in dialysate was isolated from protein-bound T4 in serum by using a commercially available micro-ED plate after 18-h ED. The 13C6 labeled T4 was added into dialysate as an internal standard. FT4 in dialysate was purified with 96-well C18 SPE plate. FT4 in the dialysate samples was analyzed by using LC/MS/MS. The T4 certified primary reference material (IRMM-468) was used for assay calibration. FT4 IA measurement was conducted in clinical analyzers (Roche e411). A set of 40 single donor sera, covering a FT4 concentration range of 11.2–32.1 pmol/L were measured by IA and our ED-ID-LC/MS/MS, respectively. Deming regression analysis was performed to compare the results obtained by these two methods with the reference values assigned by FT4 RMP. Results The described high-throughput method based on ED-ID-LC/MS/MS could simultaneously quantified T4, T3, reverse triiodothyronine (rT3) in the dialysate within 4.5 min. The linear range of the routine FT4 assay covered 1–100 pg/mL. The assay sensitivity allowed detection of 0.3 pg/mL FT4 in serum, which is sufficient for FT4 measurement in clinically relevant ranges including hypothyroid patient samples. The throughput of the ED-ID-LC/MS/MS method could be further improved by application of automated liquid handling system. Deming regression analysis showed good agreement between RMP and the routine ED-ID-LC/MS/MS method with a slope close to 1 (P > 0.05) and an intercept close to 0 (P > 0.05), indicating that the results generated from the platform of high-throughput ED-ID-LC/MS/MS could be tracible to FT4 RMP. However, proportional bias (P < 0.05) with slope of 0.74 and constant bias (P < 0.05) with intercept of 2.6 was observed between RMP and IA tested, indicating calibration bias should be addressed to standardize IA. Conclusion In summary, the accuracy, sensitivity, and throughput of ED-ID-LC/MS/MS method are appropriate for FT4 measurements in clinical laboratories and in large epidemiologic studies to establish accurate reference intervals for FT4. Population data generated by FT4 ED-ID-LC/MS/MS method will support the use of standardized FT4 measurements.