BackgroundAmbulance service quality measures have focused on response times and a small number of emergency conditions, such as cardiac arrest. These quality measures do not reflect the care for the wide range of problems that ambulance services respond to and the Prehospital Outcomes for Evidence Based Evaluation (PhOEBE) programme sought to address this.ObjectivesThe aim was to develop new ways of measuring the impact of ambulance service care by reviewing and synthesising literature on prehospital ambulance outcome measures and using consensus methods to identify measures for further development; creating a data set linking routinely collected ambulance service, hospital and mortality data; and using the linked data to explore the development of case-mix adjustment models to assess differences or changes in processes and outcomes resulting from ambulance service care.DesignA mixed-methods study using a systematic review and synthesis of performance and outcome measures reported in policy and research literature; qualitative interviews with ambulance service users; a three-stage consensus process to identify candidate indicators; the creation of a data set linking ambulance, hospital and mortality data; and statistical modelling of the linked data set to produce novel case-mix adjustment measures of ambulance service quality.SettingEast Midlands and Yorkshire, England.ParticipantsAmbulance services, patients, public, emergency care clinical academics, commissioners and policy-makers between 2011 and 2015.InterventionsNone.Main outcome measuresAmbulance performance and quality measures.Data sourcesAmbulance call-and-dispatch and electronic patient report forms, Hospital Episode Statistics, accident and emergency and inpatient data, and Office for National Statistics mortality data.ResultsSeventy-two candidate measures were generated from systematic reviews in four categories: (1) ambulance service operations (n = 14), (2) clinical management of patients (n = 20), (3) impact of care on patients (n = 9) and (4) time measures (n = 29). The most common operations measures were call triage accuracy; clinical management was adherence to care protocols, and for patient outcome it was survival measures. Excluding time measures, nine measures were highly prioritised by participants taking part in the consensus event, including measures relating to pain, patient experience, accuracy of dispatch decisions and patient safety. Twenty experts participated in two Delphi rounds to refine and prioritise measures and 20 measures scored ≥ 8/9 points, which indicated good consensus. Eighteen patient and public representatives attending a consensus workshop identified six measures as important: time to definitive care, response time, reduction in pain score, calls correctly prioritised to appropriate levels of response, proportion of patients with a specific condition who are treated in accordance with established guidelines, and survival to hospital discharge for treatable emergency conditions. From this we developed six new potential indicators using the linked data set, of which five were constructed using case-mix-adjusted predictive models: (1) mean change in pain score; (2) proportion of serious emergency conditions correctly identified at the time of the 999 call; (3) response time (unadjusted); (4) proportion of decisions to leave a patient at scene that were potentially inappropriate; (5) proportion of patients transported to the emergency department by 999 emergency ambulance who did not require treatment or investigation(s); and (6) proportion of ambulance patients with a serious emergency condition who survive to admission, and to 7 days post admission. Two indicators (pain score and response times) did not need case-mix adjustment. Among the four adjusted indicators, we found that accuracy of call triage was 61%, rate of potentially inappropriate decisions to leave at home was 5–10%, unnecessary transport to hospital was 1.7–19.2% and survival to hospital admission was 89.5–96.4% depending on Clinical Commissioning Group area. We were unable to complete a fourth objective to test the indicators in use because of delays in obtaining data. An economic analysis using indicators (4) and (5) showed that incorrect decisions resulted in higher costs.LimitationsCreation of a linked data set was complex and time-consuming and data quality was variable. Construction of the indicators was also complex and revealed the effects of other services on outcome, which limits comparisons between services.ConclusionsWe identified and prioritised, through consensus processes, a set of potential ambulance service quality measures that reflected preferences of services and users. Together, these encompass a broad range of domains relevant to the population using the emergency ambulance service. The quality measures can be used to compare ambulance services or regions or measure performance over time if there are improvements in mechanisms for linking data across services.Future workThe new measures can be used to assess different dimensions of ambulance service delivery but current data challenges prohibit routine use. There are opportunities to improve data linkage processes and to further develop, validate and simplify these measures.FundingThe National Institute for Health Research Programme Grants for Applied Research programme.