Abstract

In the roundtable that follows, clinicians discuss a study published in this issue of the Journal in light of its methodology, relevance to practice, and implications for future research. Article discussed:Trautmann GM, Kip KE, Richter HE, et al. Do short-term markers of treatment efficacy predict long-term sequelae of PID? Am J Obstet Gynecol 2008;198:30.e1-30.e7. In the roundtable that follows, clinicians discuss a study published in this issue of the Journal in light of its methodology, relevance to practice, and implications for future research. Article discussed: Trautmann GM, Kip KE, Richter HE, et al. Do short-term markers of treatment efficacy predict long-term sequelae of PID? Am J Obstet Gynecol 2008;198:30.e1-30.e7. ■What is the research design of this investigation?■What were the primary outcomes?■What are the limitations of unplanned or secondary analyses?■How do current diagnostic recommendations differ from the eligibility criteria?■Was the sample size adequate?■How might a high rate of loss-to-follow-up affect the results?■How did researchers determine that tenderness at 30 days did not predict long-term sequelae?■How were the positive and negative predictive values determined?■What are the strengths of this study?■Will the results affect clinical practice? Pelvic inflammatory disease (PID) is common and includes upper female genital tract infections such as endometritis, salpingitis, and tubo-ovarian abscesses. Approximately 770,000 cases are diagnosed annually in the United States.1Sutton M.Y. Sternberg M. Zaidi A. St. Louis M.E. Markowitz L.E. Trends in pelvic inflammatory disease hospital discharges and ambulatory visits, United States, 1985-2001.Sex Transm Dis. 2005; 32: 778-784Crossref PubMed Scopus (115) Google Scholar Although PID is usually caused by sexually-transmitted organisms, normal genital flora have also been implicated.2Haggerty C.L. Ness R.B. Newest approaches to treatment of pelvic inflammatory disease: a review of recent randomized clinical trials.Clin Infect Dis. 2007; 44: 953-960Crossref PubMed Scopus (41) Google Scholar The Pelvic Inflammatory Disease Evaluation and Clinical Health (PEACH) Randomized Trial showed that inpatient and outpatient treatments are equally effective, but despite the availability of reliable antibiotics, long-term sequelae remain widespread, costing almost $2 billion per year.3Ness R.B. Soper D.E. Holley R.L. et al.Effectiveness of inpatient and outpatient treatment strategies for women with pelvic inflammatory disease: results from the Pelvic Inflammatory Disease Evaluation and Clinical Health (PEACH) Randomized Trial.Am J Obstet Gynecol. 2002; 186: 929-937Abstract Full Text Full Text PDF PubMed Scopus (309) Google Scholar, 4Rein D.B. Kassler W.J. Irwin K.L. Rabiee L. Direct medical cost of pelvic inflammatory disease and its sequelae: decreasing, but still substantial.Obstet Gynecol. 2000; 95: 397-402Crossref PubMed Scopus (168) Google Scholar This month, Journal Club members discussed a study that addressed whether short-term treatment markers predict the risk of lasting consequences of PID. Madden: What is the research design of this investigation? Despotovic: This research study is a secondary analysis of a subgroup of subjects from the PEACH trial. Published in 2002, the PEACH trial compared reproductive outcomes in patients randomized to inpatient or outpatient treatment of PID.3Ness R.B. Soper D.E. Holley R.L. et al.Effectiveness of inpatient and outpatient treatment strategies for women with pelvic inflammatory disease: results from the Pelvic Inflammatory Disease Evaluation and Clinical Health (PEACH) Randomized Trial.Am J Obstet Gynecol. 2002; 186: 929-937Abstract Full Text Full Text PDF PubMed Scopus (309) Google Scholar No differences in long-term outcomes were found between the treatment arms, and the groups were combined into a single prospective cohort study. The patients were then followed to evaluate the long-term sequelae of PID. Subjects included women diagnosed with mild to moderate PID who had pelvic discomfort for a period of 30 days or less; pelvic organ tenderness on bimanual exam; leukorrhea and/or mucopurulent cervicitis; and/or untreated documented gonococcal or chlamydial cervicitis. Five days after treatment was begun, all were assessed for tenderness; 30 days post-treatment initiation, they were evaluated for tenderness, cervical infection, and endometritis. The subjects were followed with telephone interviews at 3-month intervals during the first year and 4-month intervals after that, for a mean follow up of 6 years. The secondary analysis under discussion today included 713 subjects, for whom there was complete information at the 5-day visit. Madden: Why does Table 1 show only 1 study group? Despotovic: Table 1, which shows the characteristics of the participants in this recent analysis, includes 1 group because the treatment to which the subjects were originally randomized was not associated with a difference in reproductive outcomes; therefore the 2 groups were combined and analyzed as a whole. Madden: How would you state the null hypothesis for this study? Schaecher: Short-term markers of PID, including tenderness at 5 or 30 days and evidence of microbiologic cure, do not predict long-term sequelae, such as lack of pregnancy, recurrent PID, and chronic pelvic pain. Madden: What were the primary outcomes in this analysis? Houston: This study evaluated the predictive relationship between short-term clinical markers of PID and long-term clinical sequelae of the disease. The outcomes showed that specific short-term markers—tenderness at 5 and 30 days post-treatment, cervical infection at 30 days post-treatment, and endometritis at 30 days—do not correlate significantly with the development of long-term sequelae, including infertility, PID recurrence, and chronic pelvic pain. These seem like very reasonable and clinically relevant endpoints. Madden: What are the limitations of unplanned or secondary analyses? Houston: In the PEACH trial, similar long-term sequelae were used as primary outcome endpoints but with respect to a different exposure. The impact of inpatient versus outpatient treatment on short-term outcomes, such as side effects and need to change therapy, was evaluated. In addition, the effects of inpatient and outpatient treatment on long-term sequelae of PID, including infertility, PID recurrence, chronic pelvic pain, ectopic pregnancy, and tubal occlusion were examined. The limitations of secondary analyses arise from the original intent, design, and power of the parent study. Sometimes, secondary analyses that use only a portion of the original data can have a problem with statistical power. One might also be concerned that data collection was less ideal for secondary questions than it had been for the primary hypothesis. Lastly, if multiple secondary analyses are performed, a statistically significant finding might occur by chance alone, an event known as an “alpha error.” With all of those limitations in mind, I think that secondary data analysis can be very important in clinical research. Madden: How was PID diagnosed in the PEACH trial? Schaecher: For the purposes of the PEACH trial, PID was defined as:•A history of pelvic discomfort for a period of 30 days or less•Pelvic organ tenderness (uterine or adnexal) on bimanual examination•Leukorrhea (white blood cells exceeded epithelial cells in a sample examined microscopically) or mucopurulent cervicitis (grossly yellow or green exudate was obtained with a cervical swab) or both and/or untreated but documented gonococcal or chlamydial cervicitis Madden: What are the current CDC recommendations for the diagnosis of PID, and how are they different from the eligibility criteria for this study? Hladky: The Centers for Disease Control and Prevention (CDC) recommends empiric treatment of PID for sexually active young women and other women at risk for sexually transmitted diseases if they have pelvic or lower abdominal pain; if no other cause for the illness can be identified; and if 1 or more of the following minimum criteria are present on pelvic examination: cervical motion tenderness OR uterine tenderness OR adnexal tenderness.5Centers for Disease Control and PreventionSexually transmitted diseases treatment guidelines 2006: pelvic inflammatory disease.http://www.cdc.gov/std/treatment/2006/pid.htmGoogle Scholar Because PID is difficult to diagnose and even apparently mild or subclinical PID can damage the reproductive health of women, the CDC advises “a low threshold for the diagnosis of PID.” The PEACH trial requirement that all 3 minimum criteria be present before the initiation of empiric treatment could result in a decreased sensitivity for the diagnosis of PID, at least when compared to what we use in practice. Relying on signs of lower genital tract inflammation in combination with 1 of the 3 minimum criteria to identify PID increases the specificity of diagnosis. In summary, the minimum requirements for a diagnosis of PID do differ between the CDC and PEACH. The PEACH requirements are stricter, including some of the “additional criteria” suggested by CDC for diagnosis. This suggests that the patients enrolled in the PEACH trial were sicker than the average patient diagnosed with PID. It is possible that in general clinical practice, we are treating woman who have had the disease for a shorter duration or who have less severe disease compared with the women enrolled in this study. According to the authors, women who delay in seeking treatment for PID have a 3-fold higher risk of infertility, so perhaps these patients had more advanced disease. Madden: What short-term markers did they look for at the 5-day visit versus the 30-day visit? What is the summary marker? Despotovic: The short-term marker evaluated at the 5-day examination was tenderness, which was assessed using a 36-point scale. The short-term markers evaluated at the 30-day examination included tenderness assessed by the same 36-point scale and cervical infection, which was demonstrated with positive results from a culture for Neisseria gonorrhoeae, polymerase chain reaction for Chlamydia trachomatis, Gram-stain detection of bacterial vaginosis, and/or aspiration of the endometrium for detection of gonorrhea or chlamydial infection. An additional summary marker included the presence of any of the 30-day markers: tenderness, cervical infection, or endometritis. The summary marker, along with the 5-day and 30-day short-term markers, was analyzed for association with chronic pelvic pain, recurrent PID, and reproductive outcomes. Madden: Why were they able to collapse the 3 separate 30-day markers into a single summary marker? Despotovic: I believe they could create a single marker because they were able to show that those variables were not significantly correlated with each other. Madden: Was the sample size of the study adequate? Hladky: This study had a fixed sample size based on follow-up data available at 5 days (n=713) and 30 days (n=298). They used the fixed sample size to determine hazard ratios capable of detecting a difference with 80% power. At 5 days, the study would have enough power to detect a statistically significant difference with a hazard ratio of 0.71 for pregnancy, 1.38 for chronic pelvic pain, and 1.36 for recurrent PID. And at 30 days, the power was sufficient to detect hazard ratios of 0.52 for pregnancy, 1.95 for pain, and 1.88 for recurrent PID. Madden: What are alpha and beta in sample-size calculations? Why do we not set alpha at a very, very small number? Hladky: In this study, alpha was set at 0.05 and beta at 0.2, corresponding to an 80% power. Alpha refers to the probability of a Type I error (finding a difference when one does not really exist or a false positive result), while beta refers to the probability of making a Type II error (failing to find a difference when one does exist or a false negative result). The power of a study is calculated as 1 - beta. As alpha decreases, the sample size must increase. To maintain adequate power, a larger and larger sample size would be required. Madden: What were the follow-up rates at 5 days versus 30 days? How might a high rate of loss-to-follow-up affect the study results? Houston: There were 713 women eligible for enrollment who returned for the scheduled follow-up visit at 5 days and who had complete data at that time. There was a significant loss to follow-up with only 298 women (n=298) having complete data at 30 days. The data included tenderness assessment, cervical swab for gonorrhea and chlamydia, and endometrial aspirates. One could hypothesize about an array of reasons why women failed to return for the 30-day visit. For example, they might be feeling too ill to come in, or they might be feeling well enough to assume additional follow-up was not necessary. These 2 different possibilities would potentially affect the outcomes in opposite ways. I think the epidemiological word for this is bias. It is hard to know if and how this bias could affect the study results. Madden: One limitation of the study is that the investigators diagnosed repeat PID by self-report. How do they justify this decision? Despotovic: Self-report of recurrent PID has limitations, including recall bias, inability to recall the correct diagnosis, inaccuracy of diagnosis, unrecognized subclinical infection, and unwillingness to admit recurrence. The researchers justify the decision to use self-report of recurrent PID by noting that previous investigators compared self-report of recurrent PID to medical records and found that rates of recurrent PID by self-report and by medical record review were similar. So, I believe that this was a reasonable thing to do. Madden: What is the Kaplan-Meier method? Allsworth: The Kaplan-Meier method is a common technique for the estimation of the survival function or the probability of surviving event-free past a specific time. In this study, Kaplan-Meier curves were used in the estimation of the cumulative incidences of chronic pelvic pain, recurrent PID, and pregnancy in relation to the short-term markers of treatment efficacy. This approach allows for the estimation of the cumulative incidence or the risk for disease during a particular time period—which is typically estimated as the ratio of the number of events divided by the sample size—in the presence of unequal follow-up times. This is possible when the probability of an event is stable throughout the study, the probability of incidence is independent of censoring, and the probability of an event in a particular time period is independent of previous time periods—all reasonable assumptions for this study. Madden: What is a hazard ratio? Why do they use these statistical methods rather than an odds ratio or relative risk? Allsworth: The hazard ratio is an approximation of the incidence rate ratio usually obtained from a Cox proportional hazards regression model, another survival analysis technique. Both the Kaplan-Meier and Cox proportional hazards regression methods are appropriate for time-to-event data like those in the PEACH cohort. A relative risk—or ratio of the cumulative incidence in those with the short-term marker of interest versus those without the marker—is conceptually similar to the approximation of the incidence rate ratio obtained with the hazard rate. Yet, it is less powerful since it ignores length of observation and censoring. The odds ratio, under certain conditions, could also be an approximation of the relative risk. However, in this study, where all of the outcomes of investigation are common (experienced by more than 10% of the population), the odds ratio is likely to be an overestimate of the relative risk and is therefore not appropriate. Madden: If you look at the values listed in Table 1, you can see that 624 women returned for the 30-day follow-up visit; however, the authors only included 298 women in the analysis. Why? Schaecher: Table 1 shows that 624 patients returned for evaluation of tenderness at 30 days, but complete data for all 4 covariates was available only for 298 women. Madden: If the investigators had chosen to look at the 30-day tenderness as an independent marker, they would have had a larger sample size to detect a difference; they would have had 624 women. But because they required complete information at the 30-day visit, they greatly reduced their available sample size. Madden: The researchers found that tenderness at 30 days was significantly associated with risk of recurrent PID and chronic pelvic pain, whereas 5-day tenderness, cervical infection, and endometritis were not. Yet, in the conclusion they say that tenderness at 30 days does not predict PID-related morbidities. How did they come to this conclusion?

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call