Establishing local trimester-specific reference intervals for gestational TSH and free T4 (FT4) is often not feasible, necessitating alternative strategies. We aimed to systematically quantify the diagnostic performance of standardized modifications of center-specific nonpregnancy reference intervals as compared to trimester-specific reference intervals. We included prospective cohorts participating in the Consortium on Thyroid and Pregnancy. After relevant exclusions, reference intervals were calculated per cohort in thyroperoxidase antibody-negative women. Modifications to the nonpregnancy reference intervals included an absolute modification (per .1 mU/L TSH or 1 pmol/L free T4), relative modification (in steps of 5%) and fixed limits (upper TSH limit between 3.0 and 4.5 mU/L and lower FT4 limit 5-15 pmol/L). We compared (sub)clinical hypothyroidism prevalence, sensitivity, and positive predictive value (PPV) of these methodologies with population-based trimester-specific reference intervals. The final study population comprised 52 496 participants in 18 cohorts. Optimal modifications of standard reference intervals to diagnose gestational overt hypothyroidism were -5% for the upper limit of TSH and +5% for the lower limit of FT4 (sensitivity, .70, CI, 0.47-0.86; PPV, 0.64, CI, 0.54-0.74). For subclinical hypothyroidism, these were -20% for the upper limit of TSH and -15% for the lower limit of FT4 (sensitivity, 0.91; CI, 0.67-0.98; PPV, 0.71, CI, 0.58-0.80). Absolute and fixed modifications yielded similar results. CIs were wide, limiting generalizability. We could not identify modifications of nonpregnancy TSH and FT4 reference intervals that would enable centers to adequately approximate trimester-specific reference intervals. Future efforts should be turned toward studying the meaningfulness of trimester-specific reference intervals and risk-based decision limits.