Health services that function across multiple facilities inevitably experience variation in service characteristics between facilities to accommodate local health settings, nuances and priorities (Partington et al., 2017). In this situation it is recommended that potential sources of variation between facilities are identified to better inform future serve planning, ensuring equitable and efficient provision of care (ACSQHC, 2013; Partington et al., 2017). This can be challenging in musculoskeletal care as systems and processes for meaningful benchmarking of musculoskeletal services are in their infancy (Burgess et al., 2022) One example of a musculoskeletal service recently reported to have variation in outcomes across its numerous facilities is the Neurosurgical and Orthopaedic Physiotherapy Screening Clinic and Multi-disciplinary Service (N/OPSC&MDS) (Raymer et al., 2021). The N/OPSC&MDS is an advanced practice physiotherapist-led model of care designed to address overburdened public orthopaedic and neurosurgical medical specialist outpatient services across Queensland, Australia (Moretto et al., 2019). Selected patients with musculoskeletal conditions waitlisted on medical specialist outpatient services are initially assessed by an Advanced Musculoskeletal Physiotherapist. If assessed as potentially amenable to non-surgical management, patients are usually referred for a trial of pragmatic multidisciplinary care (e.g. physiotherapy, psychology, dietetics, occupational therapy, pharmacy, as required) (Cottrell et al., 2018). While operational in 18 facilities (17 hospitals and one community-based facility) under common principles, the services vary in scale and are tailored to the local context accommodating the local patient case mix, as well as health service and organisational priorities and processes. In a previous audit (between 2012 and 2017) of the N/OPSC&MDS, benefits to overburdened specialist outpatient services were evident in that nearly 70% of patients managed within the service were discharged without requiring specialist medical consultation (primary service outcome of the N/OPSC&MDS) (Raymer et al., 2021). This figure aligned with 69% of patients in the audit reporting clinically meaningful improvements in their condition (Global Rating of Change [GROC], Primary Clinical Outcome). However, substantial variation in the primary service outcome was observed between N/OPSC&MDS facilities statewide, with 15/18 facilities significantly different to the referent facility. This was despite the primary clinical outcome only varying significantly at 3/18 facilities. Crucially, adjustment for service-related variables inflated variation in the primary service outcome from 10 to 15 facilities with remaining uncertainty regarding other potential sources of facility variation (only 32% explained variance) (Raymer et al., 2021). Given that service planning and quality improvement depends on identifying service-related variables potentially impacting outcomes, a revised set of N/OPSC&MDS standardised state-wide metrics were implemented following the audit. These were based on reviews of relevant epidemiological (Fehring, 2016; Sangha et al., 2003; Yen et al., 2015), patient related outcome measures (Fennelly et al., 2018; Hill et al., 2020; Nicholas et al., 2015), and chronic disease database (national and international) literature (Clement et al., 2015; ICHOM, 2017; Williams et al., 2016). This short report evaluates the impact of implementing the revised state-wide metrics in better explaining outcome disparities between N/OPSC&MDS facilities, particularly the primary service outcome of discharge pathway. It is hypothesised the previously observed variation in primary service outcome of this multi-facility musculoskeletal service will be better explained by the refined service metrics. Alternatively, if variation in the primary service outcome is not better explained with refined service metrics, variation may instead reflect differences in service provision between facilities with implications for future service planning and quality improvement. Adopting the same methods as the previous study (Raymer et al., 2021), the N/OPSC&MDS Measurement Analysis and Reporting System database was audited in a more recent period (1st July 2018 to 30th June 2020) following implementation of the revised metrics at 18 eligible service facilities. As previous, the primary service outcome was dichotomised as either Discharged (discharged from the service with no specialist medical review required) or Specialist RV (reinstated for specialist medical review), and the primary clinical outcome dichotomised as either Responder (achieved clinically meaningful change; +2 to +5 score) or Non-Responder (not achieving clinically meaningful change; −5 to +1 score) based on the 11-point GROC scale (Kamper et al., 2009; Raymer et al., 2021). Table 1 demonstrates the revised N/OPSC&MDS state-wide metrics that incorporate secondary outcome measures and explanatory variables potentially contributing to variation. Data analysis and reporting are consistent with the previous paper (Raymer et al., 2021) to facilitate comparisons between findings. Data was analysed descriptively for all outcomes and potential explanatory variables. Hierarchical binomial logistic regression models explored variation in primary service (Discharged, Specialist RV) and clinical (Responder, Non-Responder) outcomes (dependent variables) separately, assessing their relationship with facilities, and the potential explanatory service-related variables (independent variables), while additionally accounting for influences of the patient-related explanatory variables. Regression Model 1 evaluated the uncorrected relationship between outcome and facility, with the facility closest to the state average coded as the ‘Referent’ in SPSS. Model 2 evaluated the additional impact of patient-related variables on the relationship between outcome and facility as recommended in SPSS (Field, 2009). Model 3 evaluated the further impact of the explanatory service-related variables of interest. Alpha was set at 0.05 for all statistical analysis. All analyses were undertaken using SPSS v24. There were 19,106 eligible client records retrieved with high rates of completion (>85% of service-related variables, >70% of the patient socio-demographic measures), with minimal variation in data completeness across facilities. There was a lower completion rate for the primary (GROC, 54% of those eligible completed) and secondary (completion rate range 52%–75% at baseline, 20%–57% at discharge) clinical outcome measures. Approximately 54% of patients were either discharged at their initial N/OPSC & MDS assessment or did not require review by the Service Leader and therefore were not eligible to complete discharge clinical outcomes. Across all facilities 72.5% of discharged patients did not require a medical specialist review (Primary service outcome), although this outcome varied across conditions as shown in Table 2. Across all facilities 67.1% of patients reported a clinically meaningful response to management within the N/OPSC&MDS (Supplementary Table 1). Hierarchical binomial regression modelling findings for the primary service (Discharged) and clinical (Responder) outcomes are shown in Table 3 and Supplementary Table 2, respectively. Preliminary analyses demonstrated many of the patient-related variables (Pain Severity, STarT MSK, Pain Self Efficacy Questionnaire—Short Form (PSEQ-2), Oswestry Disability Index (ODI), Neck Disability Index (NDI), Quick Disabilities of Arm, Shoulder and Hand (QDASH), Lower Extremity Functional Scale (LEFS)) to be significantly correlated (Spearman's rho 0.43–0.72, p < 0.001). To avoid multi-collinearity in the multivariable model, only the Pain Severity, STarT MSK and AQOL-4D variables were selected to be carried through to the multivariate analysis based on their relevance to all conditions (investigator's judgement). The Outpatient Service (e.g. neurosurgical, orthopaedic) and Condition Managed variables were also significantly related (Spearman's rho 0.88, p < 0.001), as were Management Duration and number of Review Appointments variables (Spearman's rho 0.55, p < 0.001). Therefore only the Condition Managed and Management Duration variables, respectively, were included in the multivariate analysis. The Box-Tidwell procedure (Box & Tidwell, 1962) was performed using the variables remaining in the final models and the logit of the dependent variables (Pathway outcome, Clinical outcome). All continuous variables (BMI, Pain severity, AQOL, SEIFA score) demonstrated linearity (p > 0.05) with the logit of both dependent variables. The three progressive hierarchical binomial regression models for the primary service outcome of Discharge Pathway (reference: returned to specialist outpatients waitlist) are shown in Table 3 (Clinic 1 was coded as the Referent). In Model 1, 15 facilities are seen to be significantly different to the Referent, reducing to 9 facilities in Model 2 (adjusted for patient-related variables), and increasing to 10 facilities in Model 3 (adjusted for both patient- and service-related variables). Significant service variables in the final model included; Management Duration, Triage Category, Non-Attendance to final review, and Medical Specialist Input during N/OPSC management. No outliers were evident for the primary service outcome based on the studentised residual range (SD) (−3.04–3.0 [2.09]) being within accepted parameters (≤−5, ≥5) based on 26 predictor variables in the final models (Gray & Woodall, 1994). The logistic regression model was statistically significant, χ2 (60) = 2730, p < 0.001. The model explained 49.7% (Nagelkerke R2) of the variance in pathway outcome and correctly classified 82.2% of cases. The three progressive models of the hierarchical binomial regression for the primary clinical outcome of GROC (reference: non-response to management) is shown in Supplementary Table 2 (Clinic 16 was coded as the Referent). In Model 1, 6 facilities were significantly different to the Referent, reducing to four facilities in Model 2 (adjusted for the patient-related variables), and reducing to three facilities in Model 3 (adjusting for both patient- and service-related variables). Significant service variables in the final model included; Medical Specialist Input. No outliers were evident for the primary clinical outcome according to the studentised residuals (range [SD] −2.95–2.38 [1.47]) (Gray & Woodall, 1994). The logistic regression model was statistically significant, χ2 (58) = 422, p < 0.001. The model explained 27.2% (Nagelkerke R2) of the variance in pathway outcome and correctly classified 72.9% of cases. Two facilities (Facilities 4 and 5) had insufficient numbers and were excluded from analysis for Models 2 and 3. The evaluation showed the refined service metrics further explained disparity in the primary service outcome between N/OPSC&MDS facilities, but only partially. Overall, 10/18 facilities (compared to 15 in the previous study) remained significantly different to the referent facility for the primary service outcome, together with higher explained variance (49.7% from 32% previously). The small variation in the primary clinical outcome remained the same as the previous study (only 3/18 facilities) but with slightly improved explained variance (27.2% from 20% previously) (Raymer et al., 2021). Although sustained benefits of the service over an extended time are evident by the very similar primary service (72.5% discharged, previously 69.4%) and clinical (67.1% clinically meaningful response, previously 68.9%) outcomes of the current and previous audits (Raymer et al., 2021), the persisting variation between facilities irrespective of refined service metrics needs to be understood. When incorporating the revised service metrics, disparity between facilities in the primary service outcome was still inflated (albeit by 1 facility) when adjusted for service-related variables. Notably findings showed most service-related metrics significant in the final primary service model were also significant in the previous study (Management Duration, Triage Category, Non-Attendance, and Medical Specialist Input during N/OPSC management) (Raymer et al., 2021). Only two newly included service-related variables were significant in the final service model (Multidisciplinary Team Referral, Initiation of Investigations). From the perspective of future service planning this persisting variation between facilities in the primary service outcome potentially reflects some differences in service provision between N/OPSC&MDS facilities. In response a collaborative project is now underway between N/OPSC&MDS facilities aimed at identifying and addressing differences in service provision and unwarranted variability between facilities. The mixed methods project will gain a deeper understanding of service provision and associated service-related variables from a facility perspective. It is anticipated this deeper insight will facilitate adjustment of any modifiable service-related characteristics and service provision, closing the gap between outcomes achieved between facilities. Overall the findings of this study further highlight challenges in capturing relevant, high quality musculoskeletal data. Yet in the interests of patient care, continual refinement of service metrics remains a priority to permit meaningful benchmarking (Burgess et al., 2022) not only within multi-facility services such as the N/OPSC&MDS, but also between musculoskeletal services nationally and internationally. All authors contributed to the concept of the study, the acquisition, analysis and interpretation of data, and were involved in draughting of the manuscript. The investigators would like to thank the clinical and administrative staff of the Neurosurgical and Orthopaedic Physiotherapy Screening Clinics and Multidisciplinary Service (N/OPSC & MDS) facilities across Queensland, Australia, who completed service metrics populating the service database. Open access publishing facilitated by The University of Queensland, as part of the Wiley - The University of Queensland agreement via the Council of Australian University Librarians. The authors have no conflicts of interest to declare. The project received ethical approval by the institute's ethical review committee (HREC/17/QRBW/154). Research data are not shared. The dataset from this study is not publicly available due to the data having been collated from multiple hospital health services each with individual data custodians that require further approval for access. Please contact corresponding author ([email protected]) regarding any data requests. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.