Abstract

The criterion of reproducibility, as well as its functioning in post-non-classical science, are discussed in the Russian methodology of science. At the same time, critics avoid statistical calculations in their arguments. This raises the following questions: “What is reproducibility?” and “What is the mathematical formulation of the reproducibility criterion?” Literature review has identified five indicators of reproducibility, which was proposed by foreign colleagues. These indicators are being tested and discussed. However, there is no General mathematical formulation of the reproducibility criterion (an integral criterion covering these indicators), and these indicators have not yet become a standard. In the present work, we compare two statistical tests, related to one of these five indicators of reproducibility.Purpose of the study. The aim of this paper is to compare the powers of two tests of statistical significance that can be used to reveal the effect with the requirement of reproducibility of research results. In this case, the reproducibility is estimated by the indicator “significance”. In accordance with the first criterion, the effect is considered to be revealed if the effect size in all studies is significant (i.e. if the significance of the effect size is reproduced in all studies). In accordance with the second criterion, the effect is considered to be revealed if the weighted mean of the effect size obtained as a result of meta-analysis is significant (the significance of the effect size may be absent in individual studies).Materials and methods. Methods of mathematical statistics are used to achieve this goal. The powers of two tests are compared by two estimates. The first estimate is theoretical. The second one was obtained during a statistical experiment. The powers are calculated: 1) for different values of the Cohen’s effect size: “small”, “medium” and “large”, 2) for different degree of heterogeneity: zero (fixed-effect primary studies (from 2 to 8).Results. The power of the first test is less or much less than the power of the second one. The power of the first test decreases with the growth of the number of primary studies, and the power of the second one increases. Taking into account the conventional power value equal to 80%, the first criterion is unsuitable for use in the considered values of the parameters of primary studies (that is, if a two-tailed t-test with the significance level of 0.05 and with two samples of the typical length n=25 is used to determine the significance of the effect size in individual studies), while the power of the second test can be increased if necessary by increasing the number of primary studies included in the meta-analysis.Conclusion. If the criterion of reproducibility, known from the philosophy of science, is intended to confirm the existence of the effect (connection) or, in other words, to reveal the effect, in conditions where there is a significant random component in the measurement process, it is advisable to apply not the first, but the second test.

Highlights

  • Comparison of the power of statistical tests in connection with the discussion about the reproducibility criterion

  • Critics avoid statistical calculations in their arguments. This raises the following questions: “What is reproducibility?” and “What is the mathematical formulation of the reproducibility criterion?” Literature review has identified five indicators of reproducibility, which was proposed by foreign colleagues

  • The aim of this paper is to compare the powers of two tests of statistical significance that can be used to reveal the effect with the requirement of reproducibility of research results

Read more

Summary

Симуляция эмпирических данных

Первый тип данных соответствует модели с фиксированным эффектом, а второй тип данных – модели со случайными эффектами. Для модели с фиксированным эффектом в ряде исследований задается одинаковая фиксированная истинная величина эффекта. Задается три различных средних значения по первой генеральной совокупности (m1): на 0,2, на 0,5 и на 0,8 больше, чем среднее по второй совокупности (m2), чтобы истинное значение величины эффекта по Коэну d = m1 − m2. Для модели со случайными эффектами среднее по первой генеральной совокупности неизменно в рамках индивидуального исследования, а от одного индивидуального исследования к другому изменяется по случайному закону: m1 ∈ N(m, τ2), где τ2 – дисперсия истинного эффекта между исследованиями; m – переменная, значение которой устанавливается также на 0,2, на. 0,5 и на 0,8 больше, чем среднее по второй совокупности (m2), образуя математическое ожидание истинного эффекта μ = m − m2.

Значимость результата индивидуального исследования
Значимость результата по Критерию 1
Значимость результата по Критерию 2
Модель с фиксированным эффектом
Модель со случайными эффектами
Заключение
Polyakova YA
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call