Guideline evaluation: tricky business

Jeffrey D Klausner

doi:10.1071/sh07091

Abstract

How meaningfully to evaluate expert clinical guidelines for the management of specific diseases is an interesting question. Ideally, researchers would randomly allocate cases of the disease for which the different guidelines apply to management by clinicians adhering to the guidelines under study. Follow-up would occur and researchers would measure and compare outcomes. If the study were done well, we might be able to conclude which clinical management guideline was superior. Unfortunately, those types of studies are unlikely to be done and I am not sure such studies would be worth the cost of performing them. Different clinical management guidelines for the same disease usually have enough similarities and a few disputed or unresolved areas that expected differences in outcomes would be very minor and difficult to measure. In lieu of experimentally evaluating guidelines, researchers have taken to describing how clinical guidelines meet published criteria for effective guideline development. The assumption is made that those criteria have clinical or external validity; that is, those criteria are associated with medical outcomes. In this issue of Sexual Health, authors from the UK have used the Cluzeau andAGREE instruments to evaluate national guidelines for the management of sexually transmitted diseases from the United States Centers forDisease Prevention andControl (CDC) and British Association for Sexual Health and HIV (BASHH). The 37-item Cluzeau and 23-item AGREE instruments are very similar, the AGREE having evolved from the Cluzeau.1,2 The CDC guidelines can be found at www.cdc.gov/STD/ treatment and were most recently published in 2006. The BASHH guidelines are available at http://www.bashh.org/ guidelines.asp and appear to be updated on a disease-specific basis. For example, theBASHHguideline for themanagement of genital tract infection with Chlamydia trachomatis was updated in 2006, whereas the BASHH guideline for the management of early syphilis was updated in 2002. The availability and dissemination of guidelines via the Internet offer the opportunity for focussed and timely updates such as the CDC recommendation to avoid fluoroquinolones in the treatment of Neisseria gonorrhoeae in the USA [http://www.cdc.gov/ STD/treatment/2006/updated-regimens.htm] and the BASHH update about the availability of procaine penicillin [http://www.bashh.org/guidelines/penicillin update 0306.pdf]. The guideline criteria used by Baird et al.3 include various domains in the process of the guideline development with a focus on transparency or ‘rigour of development’ (clear reporting of funding, methodology, potential financial conflicts of interest, etc.), inclusiveness (involvement of stakeholders such as patients and clinical personnel), accountability (description of who exactly the authors are and their expertise) and process (disclosure of the writing and review schedule, dissemination activities). Each domain was equally weighted as the authors of the study created summary scores of each guideline characteristic. Some consideration in the AGREE instrument was given to how the evidence-base was examined but attention to the type, level and strength of evidence was not evaluated. That is, the guidelines do not score the evidence base used in the guidelines;what amount of evidence comes from randomised clinical trials, observational studies or expert opinion. That lack of attention to the quality of the evidence does not appear to be uncommon. One study reported that none of 24 appraisal tools of practice guidelines evaluated the clinical evidence base used to create the content of the guidelines the authors assessed.4 The authors find that the BASSH guidelines they evaluated – which were developed in accordance with AGREE – had higher summary and individual domain scores than the CDC guidelines in a similar topic. In the area of ‘rigour’ which might be consistent with the use of the available clinical evidence, the CDC guidelines consistently scored lower; however, that area include multiple measures related to the adequate articulation of the process for evaluating evidence rather than the quality of the evidence itself. The major differences between the guidelines were how each adhered to AGREE criteria regarding the issues of transparency, inclusiveness, accountability and process. Fortunately, the authors do not conclude superiority of one set of national guidelines over the other but allow the reader to infer his or herself that the guidelines with the higher score was superior. That logic is slippery at best and fallacious at worst. Given that the BASSH guidelines used the AGREE criteria as a framework should offer no surprise to the reader that when compared with guidelines that did not use the AGREE criteria one finds a higher score consistent with better adherence to predetermined criteria. In fact, one may be surprised as to why the BASSH guidelines did not score better and the CDC guidelines score as well as they did? When one actually looks at the guidelines and compares clinical management recommendations, one finds multiple similarities and a few potentially important differences. For example, in the management of genital chlamydial infection

Full Text