Diagnosis and management of disease is increasingly driven by international efforts at consensus statements based on evidence of best practice. For endocrinology, diagnostic criteria based on the results of biochemical testing have for many years formed the mainstay to objective decision making, a recognition that biomarkers correlate better with morbidity and mortality risk than do signs, symptoms, and anthropometric parameters. Such testing becomes of increasing significance in an era of closer clinical scrutiny and accountability, and dictates the need for accurate, reliable results. But how strong is the evidence base for the diagnostic cutoff levels deployed in, for example, the investigation of GH secretion disorders? Our conclusion in reviewing GH assays’ performance is of a fragile state of play. Thus, Seth et al. (1) reported a 2-fold difference in values generated by the eight methods used by 85 participants in the United Kingdom National External Quality Assessment Scheme for GH. Where the most negatively biased method reports a value of 19.7 mU/liter (the preferred reporting unit in the United Kingdom), the most positively biased reports 38.1 mU/liter. From 1994–1998, variability across United Kingdom laboratories worsened from 18–30%, and although this has now stabilized the intermethod, bias persists. These observations are not new (2, 3), and are seen whenever a diversity of calibrants and/or antibodies is deployed. Other endocrine assays such as TSH (4), LH, and PTH with, respectively, interlaboratory variability of 12.5, 15.3, and 26.9% during 2006 also provide examples (United Kingdom National External Quality Assessment Scheme, personal communication). Typically, however, little has been achieved in reducing the variation in results due to these issues. What are the consequences of such issues not being addressed? For GH, Andersson et al. (5) illustrated how laboratories transfer cutoffs to new methods while ignoring known biases in the process, in effect, sustaining historical databases that are of little diagnostic merit. Ellis et al. (6) illustrated the impact of using out-of-date cutoffs with the outcome of an interpretive exercise for an insulin tolerance test for GH deficiency in which 10% of 52 laboratories would have reported an “equivocal response” or “partial deficiency,” even though the mean peak GH level was 32.8 mU/ liter. The use of a variety of factors for converting g/liter to mU/liter is an additional unnecessary complexity that hinders data interpretation. Recently, Pokrajac et al. (7) demonstrated how 86, 55, or 11% of 104 reports of GH nadirs during an oral glucose tolerance test would be compatible with acromegaly, depending on which of the commonly used conversion factors (2.0, 2.6, or 3.0, respectively) was applied. Although conversion factors between kit calibrants and international reference materials should relate to the content of the latter, this is frequently unknown or undefined [as with pituitary derived International Standard (IS) 80/505], and most authors using conversion factors are oblivious to which reference material may be in use. Other issues are also relevant. For the clinical and laboratory communities, accumulating sufficient data for evidence-based cutoffs presents real challenges; many endocrine disorders are of low incidence (10 cases per million for isolated GH deficiency in children in the United Kingdom, three to four cases per million for acromegaly), and methods are often in use for too limited a period to allow a database to build up. For the diagnostics industry, the challenge is one of maintaining market share by balancing use of technologies favored by customers against the costs (to shareholders) of modifying their products. In this regard, the goal of higher quality is but one factor relevant to maintaining market share. For clinicians working in relative isolation from their laboratories, a failure to appreciate the weakness of the evidence bases behind many cutoffs in turn prevents recognition of the weakness of the consensus statements in which they are promulgated. Should we be surprised at the current status quo? Probably not. However, more could be done to resolve some of these issues through a wider understanding of individual agendas and a willingness to adopt pragmatic approaches. Given the wish to meet patients’ needs provides the common starting point. Pragmatic approaches strike a balance between purists’ aspirations for wholly accurate and reproducible assays, the diagnostics industry’s need to maintain customer share in a commercial market, and an often uncritical adoption by clinicians of new cutoffs as “gold standards.” The more pragmatic approach has been adopted by an international collaborative between the clinical/laboratory communities and the diagnostics industry for harmonizing GH measurement. It follows on from attempts in Japan (8) to introduce a recombinant material adjusted to World Health Organization (WHO) IS 88/624, an international GH standard whose stock has now been exhausted. Drivers for the international collaborative have included the emergence of a new internaAbbreviation: IS, International standard.
Read full abstract