Abstract

There are a number of reasons to carry out a meta-analysis Table 1. With the rapid growth of medical literature, it has become difficult, if not impossible, for clinicians to review and interpret the results of all pertinent studies to make sound clinical decisions. Clinicians and policy makers must therefore rely on reviews to summarize the results of studies. Traditionally, this role has been fulfilled by narrative reviews written by recognized experts in the particular field. However, narrative reviews are inherently biased. In contrast, systematic reviews are designed to summarize the results of a large numbers of studies, and to eliminate as much bias as possible in their interpretation. The use of quantitative methods in meta-analysis is designed to aid in this process. The use of quantitative methods in systematic reviews cannot overcome a systematic review that is flawed or biased. The adage “garbage in, garbage out” applies to meta-analyses of studies from poorly conducted systematic reviews as well. On the other hand, a narrative review is even more susceptible to potential bias, because no justification is required for excluding or discounting studies arbitrarily.Table 1Reasons for doing a meta-analysis Most often meta-analysis is used to test a particular hypothesis that was tested in a number of separate clinical trials. In this case, meta-analysis serves to increase the statistical power of studies by increasing the combined sample size. However, meta-analysis can also increase the generalizability of the results of individual studies, by examining the overall effect of a therapy in a number of different clinical settings and patient populations. Often overlooked is the hypothesis generating capability of meta-analysis. This is analogous to the post-hoc subgroup analysis that is often carried out to analyze the results of an individual clinical trial. If there are a sufficiently large number of studies, it may be possible to correlate the effect of a therapy with the way in which therapy was delivered (such as type of drug or timing of the therapy) or with patient population characteristics (such as age, gender, race, or type of underlying disease). For example, it may not be apparent from a single trial that a particular therapy is more effective in men than in women. However, a meta-analysis that includes separate studies in men and women may suggest that this is the case. This may then suggest the need for additional studies to directly compare the effect of the therapy in men versus women. Looking for reasons for differences in the results of clinical trials is rarely conclusive. As in the case with subgroup analysis of clinical trial results, the possibility of finding associations due to chance increases with the number of such associations that are examined. Hence, this type of analysis should be considered to be hypothesis generating, rather than hypothesis testing. The results of meta-analysis can be used to aid in the design of large-scale trials. For example, the overall effect of a therapy calculated across several different studies in a meta-analysis can be used to estimate the sample size needed for a large, multicenter trial. In addition, as suggested above, correlations between various study or patient characteristics and the effectiveness of treatment in a meta-analysis may suggest the need for additional studies in targeted populations. A good meta-analysis is usually the result of a close collaboration between clinician and statistician4.Bailar III, J.D. The promise and problems of meta-analysis.N Engl J Med. 1997; 337 (editorial; comment): 559-561Crossref PubMed Scopus (308) Google Scholar. However, it is helpful for clinicians to have some understanding of meta-analysis techniques. Here I will give a brief overview. More details can be found in the references and in textbooks5.The Handbook of Research Synthesis. Russell Sage Foundation, New York1994Google Scholar,6.Glass G.V. McGaw B. Smith M.L. Meta-Analysis in Social Research. Sage Publications, Beverly Hills1981Google Scholar. A number of techniques have been developed to combine the results of clinical trials. Most meta-analyses combine simple and unambiguous outcomes. However, this need not always be the case. If reports of primary studies adequately describe a complicated outcome, such reports can usually be combined in a quantitative fashion. Although vote counting is perhaps the simplest, it has major limitations7.Bushman B.J. Vote-counting procedures in meta-analysis,.The Handbook of Research Synthesis. edited by COOPER H, HEDGES LV. Russell Sage Foundation, New York1994: 193Google Scholar. Vote counting simply sums the number of studies with a “yes or no” result, and the winner takes all. An inherent limitation of vote counting is that it does not take into account the “margin of victory.” The result is just as positive if 51 out of 100 studies report a positive effect as it is if 99 out of 100 studies are positive. Likewise, vote counting does not take into account the number of subjects in the individual trials. A study with 10 participants counts just as much in the final result as a study with 10,000 participants. In addition, vote counting has limited statistical power, because it does not take into account the magnitude of differences in the treatment effect within each study. Vote counting may only be useful when the studies do not allow calculation of a treatment effect. There are several different methods for calculating the differences between treatment and control, that is, the magnitude of the treatment effect or the effect size. The effect size can be based on dichotomous outcomes (such as risk differences, relative risks, or odds ratios) or on numeric values (such as differences between treatment and control or correlations). When different units of measure are used in different studies, it may be necessary to normalize the effect sizes before combining them. One method is to divide the difference between treatment and control by the SD. A particular treatment effect then becomes the number of SDs more (or less) than control. This has the advantage of allowing trials reporting outcomes in different units (“apples and oranges”) to be combined, but the disadvantage of making it difficult for clinicians to judge the clinical relevance of the magnitude of the combined treatment effect. Effect sizes can be calculated from P values. The treatment effects of individual studies can be combined using one of several techniques. Generally, the treatment effect of each individual study is given a pre-determined weight, so studies do not contribute equally to the final result (unlike vote counting). The most common method is to weight the treatment effect in each study by its inverse variance. This gives relatively more weight to studies having less variability and to studies having a larger number of subjects. Weighting by inverse variance has the advantage of being objective. This cannot be said of another common approach, weighting studies by quality. Quality is a feature that is more in the eye of the beholder, as evidenced by the large number of different quality indexes that have been proposed to judge study quality. Most investigators agree on a number of characteristics that suggest a good study, such as masking of subjects, random allocation, use of intention-to-treat analysis, etc. However, it is more difficult to judge the relative merit of these different features and thereby derive a single composite quality index. Should the masking of subjects be given equal weight as random allocation? There is as yet no consensus for how best to weight studies for quality. It is also possible to weight studies using both inverse variance and quality. The optimal method for combining the treatment effects of studies has been much debated. Two methods (or “models”) are used most often: the fixed effects and the random effects models. The fixed effects model assumes that the studies being combined are homogeneous, and studies differ only because they use a sample of observations rather than the whole population of observations itself. The random effects model does not make that assumption, but assumes only that the sample of studies are representative of a larger population of all such studies. In the random effects model differences in the studies are due not only to differences in sampling of the treatment effects, but also to differences in the studies themselves8.Dersimonian R. Laird N. Meta-analysis in clinical trials.Controlled Clin Trials. 1986; 7: 177-188https://doi.org/10.1016/0197-2456(86)90046-2Abstract Full Text PDF PubMed Scopus (29184) Google Scholar. Thus, the fixed effects model can be thought of as giving a result pertinent to “these studies” compared to results for “studies like these” from the random effects model9.Louis T.A. Meta-analysis of clinical studies: The whole is greater than the sum of its parts.Transfusion. 1993; 33 (editorial): 698-700https://doi.org/10.1046/j.1537-2995.1993.33994025015.xCrossref Scopus (4) Google Scholar. The confidence intervals resulting from combining studies with the random effects model are typically wider (indicating more uncertainty of the result), and are never narrower than those from the fixed effects model. In the fixed effects model larger studies receive relatively more weight than smaller studies, compared to the random effects model. The random effects model may be more appropriate for combining large numbers of trials, while the fixed effects model may be better suited for combining a small number of studies, such as two or three. Although the fixed and random effects models are most commonly used, other approaches to combining studies have been developed10.Hardy R.J. Thompson S.G. A likelihood approach to meta-analysis with random effects.Stat Med. 1996; 15: 619-629https://doi.org/10.1002/(SICI)1097-0258(19960330)15:6<619::AID-SIM188>3.0.CO;2-ACrossref PubMed Scopus (382) Google Scholar,11.Louis T.A. Zelterman D. Bayesian approaches to research synthesis,.The Handbook of Research Synthesis. edited by COOPER H, HEDGES LV. Russell Sage Foundation, New York1994: 411Google Scholar. Meta-analysis, especially using the fixed effects model, assumes that trials are homogeneous in every way that could affect the outcome. This assumption can and should be tested8.Dersimonian R. Laird N. Meta-analysis in clinical trials.Controlled Clin Trials. 1986; 7: 177-188https://doi.org/10.1016/0197-2456(86)90046-2Abstract Full Text PDF PubMed Scopus (29184) Google Scholar. However, the statistical tests for homogeneity of study results are relatively insensitive, that is, have low statistical power. Often explaining variability in the results of studies to generate hypotheses can be just as important as combining their results to test hypotheses. In regression analysis the dependent variable becomes the treatment effect, generally weighted by inverse variance, and various study, treatment or patient characteristics can be used as independent or explanatory variables. Study quality indicators can be used either as one or more independent variables or as a regression weight. Regression analysis can be used to explain differences in results that are either continuous or dichotomous. Most clinicians and investigators consider the gold standard for determining the effectiveness of therapy to be the large, randomized, controlled trial. However, even very large randomized trials can produce inconclusive results. Nevertheless, much of the controversy surrounding the validity of meta-analysis has come from comparisons of meta-analysis results with the results of subsequent large clinical trials. There have been several recent attempts to validate the results of meta-analysis compared to those of large randomized, controlled trials. Recently LeLorier and co-workers compared the results of meta-analyses published in four major medical journals with those of subsequent, large, randomized, controlled trials12.Lelorier J. Gregoire G. Benhaddad A. Lapierre J. Derderian F. Discrepancies between meta-analyses and subsequent large randomized, controlled trials.N Engl J Med. 1997; 337: 536-542https://doi.org/10.1056/NEJM199708213370806Crossref PubMed Scopus (1001) Google Scholar. They identified 12 large, randomized, controlled trials and 19 meta-analyses, and were able to compare 40 primary and secondary outcomes. The positive predictive value of meta-analysis was 68% and the negative predictive value was 67%. Overall, the results of the large, randomized, controlled trials were not accurately predicted 35% of the time, although there was a statistically significant difference in results in only 5 of 40 comparisons (12%). In no instance was there a divergence where the meta-analysis and the randomized trial gave statistically significant results in the opposite direction. Borzak and Ridker examined reasons why the results of meta-analysis differed from those of large, randomized, controlled trials13.Borzak S. Ridker P.M. Discordance between meta-analyses and large-scale randomized, controlled trials. Examples from the management of acute myocardial infarction.Ann Intern Med. 1997; 123: 873-877Crossref Scopus (115) Google Scholar. In the first example, the combined results of seven trials of intravenous nitroglycerin and three trials of nitroprusside (N = 2000 patients) indicated that nitrates significantly reduced mortality after myocardial infarction14.Yusuf S. Collins R. Macmahon S. Peto R. Effect of intravenous nitrates on mortality in acute myocardial infarction: An overview of the randomised trials.Lancet. 1988; 1: 1088-1092https://doi.org/10.1016/S0140-6736(88)91906-XAbstract PubMed Scopus (314) Google Scholar. Subsequently two large randomized, controlled trials failed to confirm the results of the meta-analysis15.Gruppo Italiano Per Lo Studio Della Sopravvivenza Nell’Infarto Miocardico Effects of lisinopril and transdermal glyceryl trinitrate singly and together on 6-week mortality and ventricular function after acute myocardial infarction.Lancet. 1994; 343: 1115-1122PubMed Google Scholar,16.Fourth International Study Of Infarct Survival Collaborative Group A randomised factorial trial assessing early oral captopril, oral mononitrate, and intravenous magnesium sulphate in 58,050 patients with suspected acute myocardial infarction.Lancet. 1995; 345: 669-685Crossref PubMed Scopus (1924) Google Scholar. There were several possible explanations for this apparent discrepancy: (1) In the large-scale trials over one half of the control patients also received nitrates, possibly reducing differences between the two groups. (2) The number of patients in the meta-analysis was relatively small, reducing the reliability of its conclusions. (3) The trials in the meta-analysis were all completed several years before the large-scale trials. In the mean time the adoption of new therapies led to a 50% decrease in mortality from myocardial infarction. Thus, nitrates may have been more effective in the earlier trials, and less so in the later large-scale trials. Often it is possible to determine reasons for discrepancies between meta-analysis of small trials compared to the results of large randomized, controlled trials. Cappelleri and co-workers compared the results of pooled small trials with those of “large” trials17.Cappelleri J.C. Ioannidis J.P.A. Schmid C.H. De Ferranti S.D. Aubert M. Chalmers T.C. Lau J. Large trials vs meta-analysis of smaller trials. How do their results compare?.JAMA. 1996; 276: 1332-1338Crossref PubMed Google Scholar. Using a random effects model, they found agreement between 79 meta-analyses and large trials in 90% of comparisons, when large trials were defined by size (N > 1000 patients), and in 82% when large trials were defined by statistical power. Twice as many disagreements were found when a fixed effects model was used. Of 15 comparisons that were statistically different (using the power approach to define large trials), in 9 the large trials failed to confirm the meta-analysis results, in 2 the large trials showed an effect of treatment not evident in the meta-analysis, and in 4 both indicated beneficial effects of treatment, albeit of a different magnitude. In 5 of the 15 disparities, significant differences were related to differences in the rate of events in the control populations. Statistical evidence of publication bias could be detected in at least 1, and possibly 3 of the 15 disparities. In another 4 instances there were specific protocol or study differences that were likely to explain differences in the results. In only 5 of the 15 instances were there no apparent reasons for the different results, and in 2 of these the differences were not clinically important, since the conclusions were the same but of different magnitude. The authors concluded that unexplained, clinically relevant differences between meta-analysis and large randomized, controlled trials are rare17.Cappelleri J.C. Ioannidis J.P.A. Schmid C.H. De Ferranti S.D. Aubert M. Chalmers T.C. Lau J. Large trials vs meta-analysis of smaller trials. How do their results compare?.JAMA. 1996; 276: 1332-1338Crossref PubMed Google Scholar. There are several reasons why the results of meta-analysis may not predict those of large, randomized, controlled trials: (1) Times change. Usually, meta-analyses of small trials are performed before large, randomized, controlled trials are carried out. If the overall care of patients improves, the outcome in the control group in the large-scale trial may be better than that of the control groups in the smaller trials included in the meta-analysis. As a result, the efficacy of treatment may be relatively less in the large-scale trial. (2) Publication, file-drawer, or language bias. If the trials in the meta-analysis are not representative of all trials carried out, then the result may be biased. It is more likely that a small study will not be published than a large-scale study. (3) Differences in the individual trials. Heterogeneity in patient populations (such as age, gender, race, and type of renal disease) and study methods (such as dose, duration, and trial design) may cause the results of studies to differ. Large-scale trials are likely to be designed differently than smaller trials. For example, treatment protocols in large trials are more likely to be simpler than those of small trials, to enable enrollment to take place at multiple centers under varied conditions. (4) Poor study quality. Like any scientific study, the results of meta-analysis can only be as good as the methods used to generate them. (5) Chance. As the number of meta-analyses increases, the likelihood that some will be positive as the result of chance (at P < 0.05) increases. Meta-analysis assumes that the included trials are representative of all pertinent trials. When this is not the case, the results of the meta-analysis can be erroneous. The tendency of journal editors to favor publication of studies with “positive” outcomes can result in serious publication bias. A related problem occurs when investigators fail to submit “negative” results for publication, believing that publishing may be difficult, time consuming, and perhaps even futile. This has been called “file-drawer bias.” Publication and/or file-drawer bias can influence the results of meta-analysis17.Cappelleri J.C. Ioannidis J.P.A. Schmid C.H. De Ferranti S.D. Aubert M. Chalmers T.C. Lau J. Large trials vs meta-analysis of smaller trials. How do their results compare?.JAMA. 1996; 276: 1332-1338Crossref PubMed Google Scholar, 18.Easterbrook P.J. Berlin J.A. Gopalan R. Mathews D.R. Publication bias in clinical research.Lancet. 1991; 337: 867-872Abstract PubMed Scopus (2295) Google Scholar, 19.Dickersin K. Min Y.-L. Meinert C.L. Factors influencing publication of research results: Follow-up of applications submitted to two institutional review boards.JAMA. 1992; 267: 374-378Crossref PubMed Scopus (731) Google Scholar. Publication bias is a problem that is not unique to meta-analysis, but is rather a problem of interpreting trial results in general. In fact, meta-analysis has served a useful purpose in defining the problem and suggesting ways of dealing with it. Techniques have been developed to assess the likelihood of publication bias. One technique uses funnel plots Figure 120.Light R.J. Pillemer D.B. Summing Up: The Science of Reviewing Research. Harvard University Press, Cambridge1984Crossref Google Scholar. Funnel plots take advantage of the fact that unpublished studies are more likely to be small. A plot of treatment effects on the y-axis versus the study sample size on the x-axis should yield a funnel centered about a horizontal line representing the pooled treatment effect across all studies. The base of the funnel is wide due to more variability in treatment effects when study sample sizes are relatively small. The top of the funnel is narrower due to less variability in treatment effects when study sample sizes are larger. When there is publication bias, part of the funnel near the base may appear to be missing, because fewer small, negative versus positive studies were published. An analogous plot uses study variance in place of sample size. Formal statistical techniques have been developed to examine positive correlations between estimates of treatment effects and their variances21.Begg C.B. Mazumdar M. Operating characteristics of a rank correlation test for publication bias.Biometrics. 1994; 50: 1088-1101Crossref PubMed Scopus (11984) Google Scholar. One method for detecting file-drawer bias estimates the number of studies with zero effect that would be needed to reduce the combined p-value to a non-significant level22.Rosenthal R. The “file-drawer problem” and tolerance for null results.Psychol Bull. 1979; 86: 638-641Crossref Scopus (5457) Google Scholar. A very large number of required studies makes it unlikely that so many unpublished studies would exist. However, this technique is based on the assumption that unpublished studies have zero treatment effect. Inadequate literature search strategies can threaten the validity of meta-analysis. It is not necessary to include all pertinent studies, but it is important that the selected studies are a representative sample of all studies. Whether unpublished studies should be included in a meta-analysis is controversial. Some authors stress the importance of including all study results, published or unpublished. However, unpublished studies may be unpublished because they are flawed. The peer-review process is an important step designed to ensure that all pertinent information is given to the reader, and by-passing this process may be dangerous. Including unpublished studies also raises a number of practical concerns. In the absence of trial registries it may be very difficult to locate all pertinent clinical trials, and it is conceivable that those located will be different than those not located. In addition, the time, effort and expense of locating unpublished studies may be prohibitive. Finally, the concept that meta-analysis results should be reproducible is seriously challenged by the inclusion of unpublished studies that other investigators may not have ready access to. Excluding studies published in certain languages can also lead to a “Tower of Babel” bias. One study examined the effects of language publication bias on the results of meta-analysis23.Gregoire G. Derderian F. Lelorier J. Selecting the language of the publications included in a meta-anslysis: Is there a Tower of Babel bias?.J Clin Epidemiol. 1995; 48: 159-163Abstract Full Text PDF PubMed Scopus (407) Google Scholar. These investigators reviewed all meta-analyses published in eight medical journals between 1991 and 1993. Out of 36 meta-analyses, the exclusion of studies based on language produced results different from those that would have been obtained if studies in all languages had been included only once23.Gregoire G. Derderian F. Lelorier J. Selecting the language of the publications included in a meta-anslysis: Is there a Tower of Babel bias?.J Clin Epidemiol. 1995; 48: 159-163Abstract Full Text PDF PubMed Scopus (407) Google Scholar. Often, care of patients improves in ways that are not related to a therapy being tested in clinical trials. As a result, outcomes may change over time among controls. Improvements in medical care that reduce adverse outcomes may reduce the treatment effect, and make the effectiveness of therapy appear to diminish in more recent studies Table 2.Table 2Threats to the validity of a meta-analysis In many ways a meta-analysis can be thought of as a reproducible, scientific, study. Indeed, a meta-analysis deserves the same scrutiny that is generally applied to individual clinical, studies. There are several characteristics that suggest a meta-analysis of high quality Table 3. The purpose or hypothesis should be clearly stated. A good meta-analysis includes as many relevant studies as possible, and a description of the search techniques should be clear. In the case of electronic searches, the search terms and databases should be specified. The inclusion and exclusion criteria used to determine which trials were analyzed should also be clearly stated. Some assessment of study quality should usually be made. How data were extracted should be clear, that is, by one or more independent reviewers. Ideally, the investigators should be masked, although this may be difficult to accomplish if the studies are well known.Table 3Quality features to look for in a meta-analysis The methods used in combining studies should be indicated. Was a fixed effects model or a random effects model used? Some effort to determine whether the studies were homogeneous should also be made. If the studies were not homogeneous, this should be taken into account in the analysis and interpretation of the results. A search for reasons why studies produced different results is often very important. Were there differences in the patients studied, how the intervention was applied, the duration of follow-up, etc.? Were studies weighted by inverse variance, quality, or some combination of the two methods? Finally, was a sensitivity analysis carried out to see how results might have been affected by the assumptions that were made? Were the results similar if one or more outliers were deleted? Much misunderstanding has resulted from the tendency to think of meta-analysis as the final word. It is probably better to view meta-analysis as a technique, or tool that can be applied in many different ways to different samples of studies. More than one meta-analysis on the same subject may be both appropriate and desirable. The most cogent reason for carrying out additional meta-analysis is that new pertinent studies may be published which will help to resolve issues. In fact, some have advocated the use of cumulative meta-analysis, where the same meta-analysis is repeated after each new applicable clinical trial is carried out. In this way investigators can best judge the need for additional trials. For example, it has been shown that a cumulative meta-analysis may have made the beneficial effects of thrombolytic therapy after myocardial infarction apparent sooner, thereby obviating the need for additional trials that may have subjected some patients to unnecessary risk24.Lau J. Antman E.M. Jimenez-Silva J. Kupelnick B. Mosteller F. Chalmers T.C. Cumulative meta-analysis of therapeutic trials for myocardial infarction.N Engl J Med. 1992; 327: 248-254Crossref PubMed Scopus (958) Google Scholar,25.Lau J. Schmid C.H. Chalmers T.C. Cumulative meta-analysis of clinical trials builds evidence for exemplary medical care.J Clin Epidemiol. 1995; 48: 45-57Abstract Full Text PDF PubMed Scopus (293) Google Scholar. In addition, the same studies can be analyzed in different ways, using for example, different techniques and endpoints. Although the argument can be made that there are currently too many review articles and chapters for the practicing clinician to read, a better argument can be made for replacing at least some narrative reviews with systematic reviews. Meta-analysis is becoming a frequently used tool in nephrology, and several different issues have been addressed using meta-analysis. A Medline search with key words and major subject headings “meta-analysis” and “kidney”, along with a search of bibliographies in recent reviews, located 30 meta-analyses dealing with issues pertinent to nephrology Table 4. Excluded from this list were several meta-analyses dealing with hypertension and its effect on cardiovascular disease. Two meta-analyses were published before 1990, 14 between 1990 and 1993, and 14 between 1994 and 1997.Table 4Some examples of meta-analysis in nephrologya There have been seven meta-analyses of studies examining the effects of antihypertensive agents on the kidney. Experiments in animal models of diabetic and nondiabetic renal disease suggested that agents which reduced not only systemic blood pressure, but also intraglomerular capillary pressure were particularly effective in reducing albuminuria and renal injury. The concept that not all antihypertensive agents are equal in their ability to reduce renal injury led to a large number of clinical trials comparing the effects of angiotensin converting enzyme (ACE) inhibitors and other antihypertensive agents. Most of the early studies used urinary protein excretion as a surrogate endpoint, but more recent long-term trials have also measured changes in glomerular filtration rate. In 1993 we published results of a meta-analysis showing that blood pressure reduction was associated with reduced proteinuria and improved renal function, and that ACE inhibitors had an additional beneficial effect on protein excretion and re

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call