Background: The American Foregut Society (AFS) classification is intended to be more comprehensive and accurate than the Hill grade in assessing the gastroesophageal junction. However, real-world data regarding predictive ability and interobserver variability are limited. Methods: We performed a multi-reader validation study on images collected from patients who underwent endoscopic evaluation of reflux with hernia provocation using both AFS and Hill classification. The primary outcome was inter-reader agreement, assessed using Fleiss K. Secondary outcomes were percentage of patients with pathologic reflux at each Hill grade or AFS grade, correlation between acid exposure time (AET) and individual components of the AFS grade, and difference in predictive ability based on expert versus non-expert graders. Results: Sixty-four eligible patients were identified. Inter-rater reliability was substantial for AFS classification ( K = 0.65) and fair for Hill classification ( K= 0.28). The AFS grade demonstrated moderate overall correlation with AET (ρ = .36) and the Hill grade demonstrated overall weak correlation (ρ = .28). No single component of the AFS score performed significantly better in predicting pathologic reflux than the AFS grade (AFS F = 9.93, P = .0025) and there was no significant difference in overall grading between expert- and non-expert readers (ρ = .38 vs ρ = .29). Conclusion: The AFS score can classify the GEJ with moderate interobserver agreement and is superior to the Hill grade in predicting acid exposure time. AFS grade has better predictive power than the individual components used in scoring and is robust to differences in reader experience.
Read full abstract