Replication Crisis Research Articles

BackgroundThe reproducibility crisis in AI research remains a significant concern. While code sharing has been acknowledged as a step toward addressing this issue, our focus extends beyond this paradigm. In this work, we explore “federated testing” as an avenue for advancing reproducible AI research and development especially in medical imaging. Unlike federated learning, where a model is developed and refined on data from different centers, federated testing involves models developed by one team being deployed and evaluated by others, addressing reproducibility across various implementations. MethodsOur study follows an exploratory design aimed at systematically evaluating the sources of discrepancies in shared model execution for medical imaging and outputs on the same input data, independent of generalizability analysis. We distributed the same model code to multiple independent centers, monitoring execution in different runtime environments while considering various real-world scenarios for pre- and post-processing steps. We analyzed deployment infrastructure by comparing the impact of different computational resources (GPU vs. CPU) on model performance. To assess federated testing in AI models for medical imaging, we performed a comparative evaluation across different centers, each with distinct pre- and post-processing steps and deployment environments, specifically targeting AI-driven positron emission tomography (PET) imaging segmentation. More specifically, we studied federated testing for an AI-based model for surrogate total metabolic tumor volume (sTMTV) segmentation in PET imaging: the AI algorithm, trained on maximum intensity projection (MIP) data, segments lymphoma regions and estimates sTMTV. ResultsOur study reveals that relying solely on open-source code sharing does not guarantee reproducible results due to variations in code execution, runtime environments, and incomplete input specifications. Deploying the segmentation model on local and virtual GPUs compared to using Docker containers showed no effect on reproducibility. However, significant sources of variability were found in data preparation and pre-/post- processing techniques for PET imaging. These findings underscore the limitations of code sharing alone in achieving consistent and accurate results in federated testing. ConclusionAchieving consistently precise results in federated testing requires more than just sharing models through open-source code. Comprehensive pipeline sharing, including pre- and post-processing steps, is essential. Cloud-based platforms that automate these processes can streamline AI model testing across diverse locations. Standardizing protocols and sharing complete pipelines can significantly enhance the robustness and reproducibility of AI models.

Read full abstract

This viewpoint article discusses the utility of high-dosage experiments (HDEs) in everyday life to test theories in clinical science. HDEs involve experimental manipulations and assessments that occur over much longer periods of time than traditional experiments-generally days or even weeks. By nature, they also occur outside the lab, in the everyday environments of participants. Additionally, as with other experiments, the purpose of the study is concealed from participants. Experimental design is one of the most distinguishable characteristics of psychology that separates it from other behavioral sciences. Studies that rely on experiments are essential for theory testing and establishing the potential causal role of mechanisms that underlie psychopathology. Yet despite the value of experimental research, experimental studies are not currently given special prominence in clinical psychological science. For example, in the Journal of Psychopathology and Clinical Science, of all the empirical studies in the most recent year (2023), only three of 77 incorporated an experimental manipulation. Experimental research appears to be less popular in clinical psychology than in other fields, such as social psychology. What might account for this discrepancy? First, clinical samples are more difficult to recruit. This is important because experimental manipulations may produce small effects that require large samples for detection. Additionally, mechanisms hypothesized to underlie psychopathology are often chronic and intransigent. For example, cognitive factors (e.g., perfectionistic beliefs) could require an especially strong manipulation to modify in isolation. Researchers have argued that psychology has been experiencing a crisis in theory development. Eronen and Bringmann (2021) stated that one major reason for this crisis is the difficulty in establishing causal relationships between psychological constructs. The replication crisis has garnered even more attention (Open Science Collaboration, 2015). HDEs would help address these two crises and provide stronger and more replicable tests of theory. This could allow us to more precisely identify important mechanisms underlying psychopathology, potentially enhancing treatment efficacy, and enabling us to move the field forward. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

Read full abstract

Replication Crisis Research Articles

Related Topics

Articles published on Replication Crisis

Conflicting Results and Statistical Malleability: Embracing Pluralism of Empirical Results

From code sharing to sharing of implementations: Advancing reproducible AI development for medical imaging through federated testing

Type I Error Rates are Not Usually Inflated

Biomedical researchers' perspectives on the reproducibility of research.

Balance of power: The choice between trial and participant numbers to optimise the detection of phase-dependent effects

Trends in null hypothesis significance testing: Still going strong

A libraries reproducibility hackathon: connecting students to University research and testing the longevity of published code

Visualizemi: Visualization, Effect Size, and Replication of Measurement Invariance for Registered Reports.

Animal researchers' views on the publication of negative results and subsequent policy adoptions.

Introduction: Replicating John Hedley Brooke’s Work on the History of Science and Religion

Asymmetry in Galaxy Spin Directions: A Fully Reproducible Experiment Using HSC Data

Changing Human Behavior to Conserve Biodiversity

The prevalence of direct replication articles in top-ranking psychology journals.

Housing and Husbandry Factors Affecting Zebrafish (Danio rerio) Novel Tank Test Responses: A Global Multi-Laboratory Study.

Giving Away the Science of School Psychology

Quantifying convergence and consistency.

Reproducibility in the Classroom

Statistics in Service of Metascience: Measuring Replication Distance with Reproducibility Rate.

The utility of high-dosage experiments in everyday life to test theories in clinical science.

Reproducible brain PET data analysis: easier said than done.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Replication Crisis Research Articles

Related Topics

Articles published on Replication Crisis

Conflicting Results and Statistical Malleability: Embracing Pluralism of Empirical Results

From code sharing to sharing of implementations: Advancing reproducible AI development for medical imaging through federated testing

Type I Error Rates are Not Usually Inflated

Biomedical researchers' perspectives on the reproducibility of research.

Balance of power: The choice between trial and participant numbers to optimise the detection of phase-dependent effects

Trends in null hypothesis significance testing: Still going strong

A libraries reproducibility hackathon: connecting students to University research and testing the longevity of published code

Visualizemi: Visualization, Effect Size, and Replication of Measurement Invariance for Registered Reports.

Animal researchers' views on the publication of negative results and subsequent policy adoptions.

Introduction: Replicating John Hedley Brooke’s Work on the History of Science and Religion

Asymmetry in Galaxy Spin Directions: A Fully Reproducible Experiment Using HSC Data

Changing Human Behavior to Conserve Biodiversity

The prevalence of direct replication articles in top-ranking psychology journals.

Housing and Husbandry Factors Affecting Zebrafish (Danio rerio) Novel Tank Test Responses: A Global Multi-Laboratory Study.

Giving Away the Science of School Psychology

Quantifying convergence and consistency.

Reproducibility in the Classroom

Statistics in Service of Metascience: Measuring Replication Distance with Reproducibility Rate.

The utility of high-dosage experiments in everyday life to test theories in clinical science.

Reproducible brain PET data analysis: easier said than done.