Abstract

Many systematic reviews incorporate nonrandomised studies of effects, sometimes called quasi-experiments or natural experiments. However, the extent to which nonrandomised studies produce unbiased effect estimates is unclear in expectation or in practice. The usual way that systematic reviews quantify bias is through "risk of bias assessment" and indirect comparison of findings across studies using meta-analysis. A more direct, practical way to quantify the bias in nonrandomised studies is through "internal replication research", which compares the findings from nonrandomised studies with estimates from a benchmark randomised controlled trial conducted in the same population. Despite the existence of many risks of bias tools, none are conceptualised to assess comprehensively nonrandomised approaches with selection on unobservables, such as regression discontinuity designs (RDDs). The few that are conceptualised with these studies in mind do not draw on the extensive literature on internal replications (within-study comparisons) of randomised trials. Our research objectives were as follows:Objective 1: to undertake a systematic review of nonrandomised internal study replications of international development interventions.Objective 2: to develop a risk of bias tool for RDDs, an increasingly common method used in social and economic programme evaluation. We used the following methods to achieve our objectives.Objective 1: we searched systematically for nonrandomised internal study replications of benchmark randomised experiments of social and economic interventions in low- and middle-income countries (L&MICs). We assessed the risk of bias in benchmark randomised experiments and synthesised evidence on the relative bias effect sizes produced by benchmark and nonrandomised comparison arms.Objective 2: We used document review and expert consultation to develop further a risk of bias tool for quasi-experimental studies of interventions (ROBINS-I) for RDDs. Objective 1: we located 10 nonrandomised internal study replications of randomised trials in L&MICs, six of which are of RDDs and the remaining use a combination of statistical matching and regression techniques. We found that benchmark experiments used in internal replications in international development are in the main well-conducted but have "some concerns" about threats to validity, usually arising due to the methods of outcomes data collection. Most internal replication studies report on a range of different specifications for both the benchmark estimate and the nonrandomised replication estimate. We extracted and standardised 604 bias coefficient effect sizes from these studies, and present average results narratively.Objective 2: RDDs are characterised by prospective assignment of participants based on a threshold variable. Our review of the literature indicated there are two main types of RDD. The most common type of RDD is designed retrospectively in which the researcher identifies post-hoc the relationship between outcomes and a threshold variable which determines assignment to intervention at pretest. These designs usually draw on routine data collection such as administrative records or household surveys. The other, less common, type is a prospective design where the researcher is also involved in allocating participants to treatment groups from the outset. We developed a risk of bias tool for RDDs. Internal study replications provide the grounds on which bias assessment tools can be evidenced. We conclude that existing risk of bias tools needs to be further developed for use by Campbell collaboration authors, and there is a wide range of risk of bias tools and internal study replications to draw on in better designing these tools. We have suggested the development of a promising approach for RDD. Further work is needed on common methodologies in programme evaluation, for example on statistical matching approaches. We also highlight that broader efforts to identify all existing internal replication studies should consider more specialised systematic search strategies within particular literatures; so as to overcome a lack of systematic indexing of this evidence.

Highlights

  • Many systematic reviews include studies that use nonrandomised causal inference, hereafter called nonrandomised studies, and sometimes called quasi‐experiments (QEs; e.g., Bärnighausen, Røttingen, Rockers, Shemilt, & Tugwell, 2017; Shadish, Cook, & Campbell, 2002) or natural experiments (Dunning, 2012).1 For example, Konnerup and Kongsted (2012) found that half of the systematic reviews published in the Campbell Library up to 2012 included nonrandomised studies.The inclusion of nonrandomised studies in Campbell reviews is increasing: 81% of reviews published between 2012 and 2018 included such studies

  • Objective 1: we located 10 nonrandomised internal study replications of randomised trials in low‐ and middle‐income countries (L&MICs), six of which are of regression discontinuity design (RDD) and the remaining use a combination of statistical matching and regression techniques

  • We found that benchmark experiments used in internal replications in international development are in the main well‐conducted but have “some concerns” about threats to validity, usually arising due to the methods of outcomes data collection

Read more

Summary

Introduction

Many systematic reviews include studies that use nonrandomised causal inference, hereafter called nonrandomised studies, and sometimes called quasi‐experiments (QEs; e.g., Bärnighausen, Røttingen, Rockers, Shemilt, & Tugwell, 2017; Shadish, Cook, & Campbell, 2002) or natural experiments (Dunning, 2012). For example, Konnerup and Kongsted (2012) found that half of the systematic reviews published in the Campbell Library up to 2012 included nonrandomised studies.The inclusion of nonrandomised studies in Campbell reviews is increasing: 81% of reviews published between 2012 and 2018 included such studies. The inclusion of nonrandomised studies in reviews is justified by the lack of randomised study evidence for specific interventions, for example where randomisation is not considered feasible (Wilson, Gill, Olaghere, & McClure, 2016), or ethical (e.g., mortality outcomes), or to improve external validity such as in measuring long‐term effects (Welch et al, 2016).. It is stated that these studies might produce unbiased estimates (e.g., De La Rue, Polanin, Espelage, & Piggot, 2014).. It is stated that these studies might produce unbiased estimates (e.g., De La Rue, Polanin, Espelage, & Piggot, 2014).3 It is not clear whether nonrandomised studies typically produce comparable treatment effect estimates to unbiased estimates produced by well‐ conducted randomised controlled trials (RCTs), either in expectation or in practice The inclusion of nonrandomised studies in reviews is justified by the lack of randomised study evidence for specific interventions, for example where randomisation is not considered feasible (Wilson, Gill, Olaghere, & McClure, 2016), or ethical (e.g., mortality outcomes), or to improve external validity such as in measuring long‐term effects (Welch et al, 2016). Occasionally it is stated that these studies might produce unbiased estimates (e.g., De La Rue, Polanin, Espelage, & Piggot, 2014). it is not clear whether nonrandomised studies typically produce comparable treatment effect estimates to unbiased estimates produced by well‐ conducted randomised controlled trials (RCTs), either in expectation or in practice

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call