INTRODUCTION: Overwhelming evidence indicates that the quality of reporting of predictive models is poor, often being confounded by small datasets, inappropriate statistical methods, a lack of validation, and reproducibility. METHODS: Articles reporting prediction models published in the top five neurosurgery journals by SJR2-rank (Neurosurgery, Journal of Neurosurgery, Journal of Neurosurgery: Spine, Journal of NeuroInterventional Surgery, and Journal of Neurology, Neurosurgery, and Psychiatry) between January 1st, 2018, and January 1st, 2023, were identified via a PubMed search strategy that combined terms related to machine learning and prediction models. The identified original research articles were analyzed against the TRIPOD criteria. RESULTS: 110 articles were included and analyzed against the TRIPOD checklist. Among the articles screened, the median compliance was 57.4% (50.0-66.7%). Models utilizing machine learning-based models exhibited lower compliance on average compared to conventional learning-based models (57.1%, 50.0-66.7% vs. 68.1%, 50.2-68.1%, p = 0.472). Among the TRIPOD criteria, the lowest compliance was observed in blinding the assessment of predictors and outcomes (n = 7, 12.7% and n = 10, 16.9%, respectively), including an informative title (n = 17, 15.6%) and reporting model performance measures including confidence intervals (n = 27, 24.8%). Few studies provided sufficient information to allow for the external validation of results (n = 26, 25.7%). CONCLUSIONS: Published machine learning models predicting outcomes in neurosurgery often fall short of meeting the established guidelines laid out by TRIPOD for their development, validation, and reporting. This lack of compliance may indicate the extent to which these models have been subjected to external validation or adopted into routine clinical practice in neurosurgery.