Abstract Objective: To evaluate the agreement and reliability of intrapartum nonreasurring cardiotocography (CTG) interpretation and prediction of neonatal acidemia by obstetricians working in different centers. Methods: A retrospective cohort study involving two tertiary hospitals (The First Affiliated Hospital of Sun Yat-sen University and Perking University Third Hospital) was conducted between 30th September 2018 and 1st April 2019. Six obstetricians from two hospitals with three levels of experience (junior, medium, and senior) reviewed 100 nonreassuring fetal heart rate (FHR) tracings from 1 hour before the onset of abnormalities until delivery. Each reviewer determined the FHR pattern, the baseline, variability, and presence of acceleration, deceleration, sinusoidal pattern, and predicted whether neonatal acidemia and abnormal umbilical arterial pH < 7.1 would occur. Inter-observer agreement was assessed using the proportions of agreement (Pa) and the proportion of specific agreement (Pa for each category). Reliability was evaluated with the kappa statistic (k-Light's kappa for n raters) and Gwet's AC1 statistic. Results: Good inter-observer agreement was found in evaluation of most variables (Pa > 0.5), with the exception of early deceleration (Pa = 0.39, 95% confidence interval (CI): 0.36,0.43). Reliability was also good among most variables (AC1 > 0.40), except for acceleration, early deceleration, and prediction of neonatal acidemia (AC1 = 0.17, 0.10, and 0.25, respectively). There were no statistically significant differences among the three groups, except in the identification of accelerations (Pa = 0.89, 95% CI: 0.83,0.95; Pa = 0.50, 95% CI: 0.41,0.60, and Pa = 0.35, 95% CI: 0.25,0.43 in junior, medium and senior groups, respectively) and the prediction of neonatal acidemia (Pa = 0.52, 0.52, and 0.62 in junior, medium and senior groups, respectively), where agreement was highest and lowest in the junior-level group, respectively. The accuracy and sensitivity of the prediction for umbilical artery pH < 7.1 were similar among the three groups, but the specificity was higher in the senior groups (93.68% vs. 92.53% vs. 98.85% in junior, medium and senior groups, P = 0.015). Conclusion: Although we found a good inter-observer agreement in the evaluation of the most basic CTG features and FHR category statistically, it was insufficient to meet the clinical requirements for “no objection” interpretation for FHR tracings. Further specialized training is needed for standardized interpretation of intrapartum FHR tracings.