BackgroundThe limitations of the traditional TNM system have spurred interest in multivariable models for personalized prognostication in laryngeal and hypopharyngeal cancers (LSCC/HPSCC). However, the performance of these models depends on the quality of data and modelling methodology, affecting their potential for clinical adoption. This systematic review and meta-analysis (SR-MA) evaluated clinical predictive models (CPMs) for recurrence and survival in treated LSCC/HPSCC. We assessed models’ characteristics and methodologies, as well as performance, risk of bias (RoB), and applicability.MethodsLiterature searches were conducted in MEDLINE (OVID), Embase (OVID) and IEEE databases from January 2005 to November 2023. The search algorithm used comprehensive text word and index term combinations without language or publication type restrictions. Independent reviewers screened titles and abstracts using a predefined Population, Index, Comparator, Outcomes, Timing and Setting (PICOTS) framework. We included externally validated (EV) multivariable models, with at least one clinical predictor, that provided recurrence or survival predictions. The SR-MA followed PRISMA reporting guidelines, and PROBAST framework for RoB assessment. Model discrimination was assessed using C-index/AUC, and was presented for all models using forest plots. MA was only performed for models that were externally validated in two or more cohorts, using random-effects model. The main outcomes were model discrimination and calibration measures for survival (OS) and/or local recurrence (LR) prediction. All measures and assessments were preplanned prior to data collection.ResultsThe SR-MA identified 11 models, reported in 16 studies. Seven models for OS showed good discrimination on development, with only one excelling (C-index >0.9), and three had weak or poor discrimination. Inclusion of a radiomics score as a model parameter achieved relatively better performance. Most models had poor generalisability, demonstrated by worse discrimination performance on EV, but they still outperformed the TNM system. Only two models met the criteria for MA, with pooled EV AUCs 0.73 (95% CI 0.71-0.76) and 0.67 (95% CI 0.6-0.74). RoB was high for all models, particularly in the analysis domain.ConclusionsThis review highlighted the shortcomings of currently available models, while emphasizing the need for rigorous independent evaluations. Despite the proliferation of models, most exhibited methodological limitations and bias. Currently, no models can confidently be recommended for routine clinical use.Systematic review registrationhttps://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021248762, identifier CRD42021248762.
Read full abstract