IDENTIFICATION-ROBUST TWO-STAGE BOOTSTRAP TESTS WITH PRETESTING FOR EXOGENEITY
Pretesting for exogeneity has become routine in many empirical applications involving instrumental variables (IVs) to decide whether the ordinary least squares or IV-based method is appropriate. Guggenberger (2010a, Econometric Theory , 26, 369–382) shows that the second-stage test – based on the outcome of a Durbin-Wu-Hausman-type pretest in the first stage – exhibits extreme size distortion, with asymptotic size equal to 1 when the standard critical values are used. In this paper, we first show that both conditional and unconditional on the data, standard wild bootstrap procedures are invalid for two-stage testing. Second, we propose an identification-robust two-stage test statistic that switches between OLS-based and weak-IV-robust statistics. Third, we develop a size-adjusted wild bootstrap approach for our two-stage test that integrates specific wild bootstrap critical values with an appropriate size-adjustment method. We establish uniform validity of this procedure under conditional heteroskedasticity or clustering in the sense that the resulting tests achieve correct asymptotic size, regardless of whether the identification is strong or weak. Our procedure is especially valuable for empirical researchers facing potential weak identification. In such settings, its power advantage is notable: whereas weak-IV-robust methods maintain correct size but often suffer from relatively low power, our approach achieves better performance.
- Research Article
- 10.5206/cjsotlrcacea.2024.2.14931
- Aug 31, 2024
- The Canadian Journal for the Scholarship of Teaching and Learning
Large enrollments in undergraduate courses pose several teaching and learning challenges that impact students’ learning experience and performance. Implementing co-learning activities in tutorials of large courses can help mitigate these challenges and improve the learning environment. One type of collaborative learning activity that has become increasingly popular is two-stage testing but there are limitations to how two-stage testing has been conducted. We undertook a study to elucidate whether our modified two-stage testing protocol and other co-learning activities performed in tutorials can enhance the learning experiences of undergraduate students and foster a sense of community in a large-enrollment research methods course. The specific aims of our study were to: 1) Assess whether co-learning activities including two stage testing in tutorials improves learning outcomes and fosters cohort cohesion in a large-enrollment junior undergraduate science course. 2) Evaluate the impact of our modified two-stage testing approach on student learning and long-term retention. To assess cohort cohesion students were asked to complete a survey and were invited to participate in focus groups. Results indicated that tutorials did foster cohort cohesion among students in the tutorial. The tutorial activities helped scale down the course size and connect with their peers. We tested our modified two-stage testing protocol by administering a two-stage test (an individual test consisting of short-answer questions followed by a group test that was comprised of a subset of the individual test questions that was completed during tutorials). Approximately three months after the individual test, a retention test was administered. Student grades were significantly higher in group tests compared to the individual tests. Interestingly, students on average scored 6.2% higher on the retention test questions that were from the group test, compared to questions that were only on the individual test. These results support the idea that group tests help improve student retention. Students reported tutorials and two-stage testing to be a positive learning experience.
- Research Article
1
- 10.1096/fasebj.2019.33.1_supplement.605.3
- Apr 1, 2019
- The FASEB Journal
IntroductionAnatomy is a foundational component of biomedical sciences. To directly address concerns regarding retention of anatomical knowledge, course assessments can be redesigned as learning opportunities. Specifically, collaborative two‐stage testing is an alternative to traditional ‘independent’ testing, previously shown to improve final exam performance and retention of course material. However, past evaluations of student retention have generally compared separate cohorts of students who write either an individual test or a two‐stage test; such a design fails to control for between‐student variables.AimBuilding on previous work in the field, the primary aim of this research is to determine the impact of two‐stage collaborative testing on student recall (short‐term) and retention (long‐term) of anatomy knowledge while controlling for between‐student variables by employing a randomized crossover research design. Secondary aims of this research are to compare performance metrics between high and low performing students and evaluate students' perceptions regarding the collaborative testing structure.MethodsAt the initiation of ANAT110 (Anatomy for Medical Radiation Sciences) students (n=94) were randomized into 30 “anatomy groups” (AGs) of 3–4 students. Throughout the course AGs worked together on in‐class and in‐laboratory learning activities, including course assessments. Students were assessed using three segmented term tests (TT; 20% each) and one cumulative final exam (40%). Each TT began with all students individually completing a multiple choice exam (the IND condition). Following this, some students would convene in their AGs to collaboratively complete the same multiple choice exam in a condensed amount of time (the COL condition). To control for learning effects of the collaborative process, all 30 AGs completed TT1 as IND + COL. Experimental testing conditions were TT2 and TT3 (with crossover), where half the class completed an IND examination only and the other half completed an IND + COL examination. Data collection is currently in progress. Using results from an in‐class formative quiz (written 5 days following the TTs), robust 2×2 mixed‐factor statistical analyses will reveal the direct impact of testing condition (IND vs. COL) on anatomy recall. Using individualized final examination performance, segmented and coded for previous testing condition (IND vs. COL), similar statistical analyses will reveal the direct impact of testing condition on anatomy retention.ResultsBased on previous cohort studies, it is hypothesized that two‐stage collaborative testing will improve recall and retention of anatomical concepts. It is also hypothesized that the relative impact on performance will be consistent between low and high performers.ImportanceHolistically evaluating the educational impact and student perceptions of two‐stage collaborative testing is imperative for determining the future utility of this strategy in the context of human anatomy education.Support or Funding InformationA portion of this work is funded by the Learning & Education Advancement Fund (seed grant) through the University of TorontoThis abstract is from the Experimental Biology 2019 Meeting. There is no full text article associated with this abstract published in The FASEB Journal.
- Research Article
21
- 10.1002/sim.4780120704
- Apr 15, 1993
- Statistics in medicine
Two-stage testing involves a preliminary test of a nuisance parameter prior to testing a main hypothesis. In a two-by-two factorial trial, the treatment interaction is the nuisance to the inference about the efficacy of one of the treatments given alone. In comparing a combination therapy to both of its component therapies, the nuisance parameter is the difference in the component effects. When the preliminary test is an integral part of inference about the main parameter, the actual level of significance for the two-stage test procedure can be much higher than the desired nominal level. If one places no restriction on the value of the nuisance parameter, then any two-stage test with its significance level properly controlled has undesirable properties. This applies to comparative studies of combination agents relative to the component agents. When the interaction with an ineffective treatment is null, two-stage testing may have some power advantage for assessing monotherapy efficacy.
- Research Article
2
- 10.1080/07474949008836210
- Jan 1, 1990
- Sequential Analysis
A two-stage test is proposed for testing the hypothesis H0:F is exponential versus H1:F is NBU and not exponential, on the basis of a random sample from F. To compare the performance of the two-stage test with the corresponding one sample test, powers and expected sample sizes are computed by simulation for various alternatives. It is shown that, for the two-stage test with approximately the same power, the expected sample size is considerably smaller. Critical values are tabulated to permit application of the test. Finally, an illustrative example is given.
- Research Article
- 10.14419/ijamr.v3i4.3060
- Sep 20, 2014
- International Journal of Applied Mathematical Research
The achievable region of reliabilities in the model with several possible hypothetical probability distributions partitioned into the pair of families is considered. The achievable region for many hypotheses testing was examined by Tuncel. Decisions concerning realized probability distribution of the object must be made on the base of the samples which are received in each stage of the two-stage test. It is proved that the defined region for the vectors of reliabilities in the two-stage test characterizes the set of all achievable vectors and advantages of the two-stage testing are revealed. Keywords: LAO test, method of types, multiple hypotheses testing, reliability, two-stage test.
- Research Article
- 10.25115/eea.v28i3.4740
- Mar 14, 2021
- Studies of Applied Economics
Permanent-transitory decompositions and the analysis of the time series properties of economic variables at the business cycle frequencies strongly rely on the correct detection of the number of common stochastic trends (co-integration). Standard techniques for the determination of the number of common trends, such as the well-known sequential procedure proposed in Johansen (1996), are based on the assumption that shocks are homoskedastic. This contrasts with empirical evidence which documents that many of the key macro-economic and financial variables are driven by heteroskedastic shocks. In a recent paper, Cavaliere et al., (2010, Econometric Theory) demonstrate that Johansen's (LR) trace statistic for co-integration rank and both its i.i.d. and wild bootstrap analogues are asymptotically valid in non-stationary systems driven by heteroskedastic (martingale difference) innovations, but that the wild bootstrap performs substantially better than the other two tests in finite samples. In this paper we analyse the behaviour of sequential procedures to determine the number of common stochastic trends present based on these tests. Numerical evidence suggests that the procedure based on the wild bootstrap tests performs best in small samples under a variety of heteroskedastic innovation processes.
- Research Article
574
- 10.1016/j.jeconom.2003.10.030
- Apr 1, 2004
- Journal of Econometrics
Bootstrapping autoregressions with conditional heteroskedasticity of unknown form
- Research Article
27
- 10.2139/ssrn.358520
- Jan 1, 2002
- SSRN Electronic Journal
Conditional heteroskedasticity is an important feature of many macroeconomic and financial time series. Standard residual-based bootstrap procedures for dynamic regression models treat the regression error as i.i.d. These procedures are invalid in the presence of conditional heteroskedasticity. We establish the asymptotic validity of three easy-to-implement alternative bootstrap proposals for stationary autoregressive processes with m.d.s. errors subject to possible conditional heteroskedasticity of unknown form. These proposals are the fixed-design wild bootstrap, the recursive-design wild bootstrap and the pairwise bootstrap. In a simulation study all three procedures tend to be more accurate in small samples than the conventional large-sample approximation based on robust standard errors. In contrast, standard residual-based bootstrap methods for models with i.i.d. errors may be very inaccurate if the i.i.d. assumption is violated. We conclude that in many empirical applications the proposed robust bootstrap procedures should routinely replace conventional bootstrap procedures based on the i.i.d. error assumption.
- Research Article
1
- 10.2139/ssrn.2785162
- Jan 1, 2002
- SSRN Electronic Journal
Conditional heteroskedasticity is an important feature of many macroeconomic and financial time series. Standard residual-based bootstrap procedures for dynamic regression models treat the regression eroor as i.i.d. These procedures are invalid in the presence of conditional heteroskedasticity. We establish the asymptotic validity of three easy-to-implement alternative bootstrap proposals for stationary autoregressive processes with m.d.s. errors subject to possible conditional heteroskedasticity of unknown form. These proposals are the fixed-design wild bootstrap, the recursive design wild bootstrap and the pairwise bootstrap. In a simulation study all three procedures tend to be more accurate in small samples than the conventional large-sample approximation based on robust standard errors. In contrast, standard residual-based bootstrap methods for models with i.i.d. errors may be very inaccurate if the i.i.d. assumption is violated. We conclude that in many empirical applications the proposed robust bootstrap procedures should routinely replace conventional bootstrap procedures based on the i.i.d. error assumption. JEL Classification: C15, C22, C52
- Research Article
15
- 10.2147/ijgm.s131909
- Apr 1, 2017
- International Journal of General Medicine
In this study, Bayes’ theorem was used to determine the probability of a patient having Lyme disease (LD), given a positive test result obtained using commercial test kits in clinically diagnosed patients. In addition, an algorithm was developed to extend the theorem to the two-tier test methodology. Using a disease prevalence of 5%–75% in samples sent for testing by clinicians, evaluated with a C6 peptide enzyme-linked immunosorbent assay (ELISA), the probability of infection given a positive test ranged from 26.4% when the disease was present in 5% of referrals to 95.3% when disease was present in 75%. When applied in the case of a C6 ELISA followed by a Western blot, the algorithm developed for the two-tier test demonstrated an improvement with the probability of disease given a positive test ranging between 67.2% and 96.6%. Using an algorithm to determine false-positive results, the C6 ELISA generated 73.6% false positives with 5% prevalence and 4.7% false positives with 75% prevalence. Corresponding data for a group of test kits used to diagnose HIV generated false-positive rates from 5.4% down to 0.1% indicating that the LD tests produce up to 46 times more false positives. False-negative test results can also influence patient treatment and outcomes. The probability of a false-negative test for LD with a single test for early-stage disease was high at 66.8%, increasing to 74.9% for two-tier testing. With the least sensitive HIV test used in the two-stage test, the false-negative rate was 1.3%, indicating that the LD test generates ~60 times as many false-negative results. For late-stage LD, the two-tier test generated 16.7% false negatives compared with 0.095% false negatives generated by a two-step HIV test, which is over a 170-fold difference. Using clinically representative LD test sensitivities, the two-tier test generated over 500 times more false-negative results than two-stage HIV testing.
- Discussion
11
- 10.2147/ijgm.s145134
- Sep 1, 2017
- International journal of general medicine
In this study, Bayes' theorem was used to determine the probability of a patient having Lyme disease (LD), given a positive test result obtained using commercial test kits in clinically diagnosed patients. In addition, an algorithm was developed to extend the theorem to the two-tier test methodology. Using a disease prevalence of 5%-75% in samples sent for testing by clinicians, evaluated with a C6 peptide enzyme-linked immunosorbent assay (ELISA), the probability of infection given a positive test ranged from 26.4% when the disease was present in 5% of referrals to 95.3% when disease was present in 75%. When applied in the case of a C6 ELISA followed by a Western blot, the algorithm developed for the two-tier test demonstrated an improvement with the probability of disease given a positive test ranging between 67.2% and 96.6%. Using an algorithm to determine false-positive results, the C6 ELISA generated 73.6% false positives with 5% prevalence and 4.7% false positives with 75% prevalence. Corresponding data for a group of test kits used to diagnose HIV generated false-positive rates from 5.4% down to 0.1% indicating that the LD tests produce up to 46 times more false positives. False-negative test results can also influence patient treatment and outcomes. The probability of a false-negative test for LD with a single test for early-stage disease was high at 66.8%, increasing to 74.9% for two-tier testing. With the least sensitive HIV test used in the two-stage test, the false-negative rate was 1.3%, indicating that the LD test generates ~60 times as many false-negative results. For late-stage LD, the two-tier test generated 16.7% false negatives compared with 0.095% false negatives generated by a two-step HIV test, which is over a 170-fold difference. Using clinically representative LD test sensitivities, the two-tier test generated over 500 times more false-negative results than two-stage HIV testing.
- Research Article
15
- 10.1021/acs.jchemed.1c00219
- Jul 14, 2021
- Journal of Chemical Education
Two-stage quizzes and exams were implemented for both in-class quizzes and term tests in three sections of first-year General (Introductory) Chemistry. In the first stage, students completed the exam individually and submitted their papers. In the second stage, students collaborated with peers to complete a subset of the exam questions. The aim of the first stage was to evaluate students’ individual knowledge, while the second stage provided an opportunity for peer-led learning. Exam scores were calculated as a blend of scores on the two stages, between 80 and 85% for the first (individual) stage and 20–15% for the second (collaborative) stage. Students’ (total n = 129) written responses to open-ended questions in a long-answer student survey comparing two-stage and one-stage (in which there is only the individual portion) tests were qualitatively coded by thematic analysis, with themes developed through a grounded theory approach. The most significant conclusion was that students perceived that the two-stage test format helped to partially (but by no means fully) alleviate student exam anxiety when compared to a traditional one-stage test. Student responses were primarily positive about the two-stage format rather than negative about the one-stage format. The most common themes that emerged from student responses centered on: (1) improvement in grades, (2) positive discussion with peers, (3) immediate feedback from peers, and (4) less (perceived) pressure. Finally, students also expressed a very strong overall preference for two-stage over one-stage tests.
- Research Article
3
- 10.1109/tit.1985.1057080
- Sep 1, 1985
- IEEE Transactions on Information Theory
Asymptotic relative efficiencies of two-stage and multistage tests of a simple null hypothesis Versus a simple location alternative are considered. Observations are taken in groups, not necessarily of the same size, and a test statistic is computed for comparison with thresholds. In a two-stage test, if the test statistic of the first group of observations exceeds an upper threshold, the alternative is accepted, and if it crosses a lower threshold, the null hypothesis is accepted. Otherwise, a second group of samples is taken and a second test is performed. Two different classes of two-stage tests are considered. One of them computes the test statistic in the second stage from observations in the second group alone, while the other uses both the first and the second groups of observations. It is shown that these tests are asymptotically more efficient than fixed-sample-size tests but are less efficient than sequential probability ratio tests. With proper choices of parameters, the improvement over fixed-sample-size tests can be significant, especially when the error probabilities are small. However, the complexity of two-stage tests is comparable to that of fixed-sample-size tests, making their use desirable. The efficiency of k -stage tests, k > 2 , is also investigated with the conclusion that the behavior of a two-stage test can be enhanced by adding stages up to a point beyond which the effect of adding stages diminishes.
- Research Article
- 10.18844/ijire.v8i2.5411
- Dec 31, 2021
- International Journal of Innovative Research in Education
For meaningful learning to take place and to eliminate misconceptions, it is necessary to identify the process of conceptual change on any given topic. The research aims to examine the skills of science teacher candidates to prepare two-stage diagnostic questions on ‘Seasons, Climate and Weather Movements’. For this purpose, the research was carried out in 4 weeks with 40 (39 females, 1 male) second-grade science teacher candidates in the fall semester of the 2019-2020 academic year, using the holistic single-case study pattern of the ‘Observation-Based Case Study'. Tests prepared by pre-service teachers were applied to eighth-grade students. The 'Diagnostic Test Preparation Rubric' was collected using ”Concept maps“, ”Open-ended two-stage tests“ and ”Multiple choice two-stage tests". As a result of the descriptive and content analysis, it was noticed that teacher candidates are usually at an intermediate and good level in determining the propositions of knowledge about the subject, developing a concept map, and a two-stage diagnostic test related to the subject content. Keywords: Diagnostic questions; science teacher candidates; seasons; climate; weather movements
- Book Chapter
3
- 10.1520/stp10762s
- Jan 1, 2003
The effect of contact pad geometry on fretting fatigue behavior of high strength steel was examined by using the clamping double bridge pads system. In order to investigate the effects of bridge pad shapes, the fretting fatigue tests were carried out by using four types of bridge pad with different leg height h*. The fretting fatigue strength decreased with decreasing in the h*. Also, the stick area increased as increasing of the h*. The contact pressure distribution was measured by strain gauge and calculated by using a finite element method in order to investigate the effect of the pads leg height h*. The contact pressure concentrated in the inside on the contact surface where the main crack initiated. Two-stage tests were carried out in order to investigate the fretting fatigue damage. The two-stage test was performed to a certain cycles under fretting condition and then plain tests without fretting as removing the pads were continued.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.