Orbital fractures are common, but their management remains controversial. The aim of the present study is to assess the accuracy of an advanced artificial intelligence (AI) model, ChatGPT-4, in surgical decision-making, with a focus on orbital fracture diagnosis and management. A retrospective observational analysis was conducted, involving a sample of thirty orbital fracture cases diagnosed and managed at the Geneva University Hospital, Switzerland. The process involved creating patient vignettes from anonymised medical records and presenting them to ChatGPT-4 in three stages: initial diagnosis, refinement with radiological reports, and surgical intervention decisions. The performance of ChatGPT-4 in providing the appropriate surgical strategy was evaluated through measures of sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV), with the actual management used as the benchmark for accuracy. The AI model was able to correctly diagnose the fracture in 100% of cases. It demonstrated a specificity of 100%, and a sensitivity of 57% for treatment recommendation, indicating its effectiveness in recognizing patients truly requiring an intervention, however a moderate performance in correctly identifying cases better suited for conservative treatment. Cohen's Kappa statistic for interrater reliability of choice of treatment is 0.44, indicating a weak level of agreement between ChatGPT and the physician's actual choice of treatment. The study demonstrates that AI tools such as ChatGPT-4 can offer a high degree of accuracy for the diagnosis of orbital fractures, and in the recognition of patients requiring surgical intervention, however, it performs less well in correctly identifying patients better suited for non-surgical treatment.