Avoiding a replication crisis in deep-learning-based bioimage analysis.

Romain F Laine,Ignacio Arganda-Carreras,Ricardo Henriques,Guillaume Jacquemet

doi:10.1038/s41592-021-01284-3

Abstract

Deep learning algorithms are powerful tools to analyse, restore and transform bioimaging data, increasingly used in life sciences research. These approaches now outperform most other algorithms for a broad range of image analysis tasks. In particular, one of the promises of deep learning is the possibility to provide parameter-free, one-click data analysis achieving expert-level performances in a fraction of the time previously required. However, as with most new and upcoming technologies, the potential for inappropriate use is raising concerns among the biomedical research community. This perspective aims to provide a short overview of key concepts that we believe are important for researchers to consider when using deep learning for their microscopy studies. These comments are based on our own experience gained while optimising various deep learning tools for bioimage analysis and discussions with colleagues from both the developer and user community. In particular, we focus on describing how results obtained using deep learning can be validated and discuss what should, in our views, be considered when choosing a suitable tool. We also suggest what aspects of a deep learning analysis would need to be reported in publications to describe the use of such tools to guarantee that the work can be reproduced. We hope this perspective will foster further discussion between developers, image analysis specialists, users and journal editors to define adequate guidelines and ensure that this transformative technology is used appropriately.

Highlights

Microscopy is a leading technology used to gain fundamental insight for biological research
Choosing the most appropriate tool out of these options largely depends on what tasks or combination of tasks need to be performed, the scale of the analysis and the level of computational skills required to run them. In this Comment, we suggest a set of best practices for implementing and reporting on the use and development of deep learning (DL) image analysis tools
We propose that many of these concerns can be significantly alleviated by the careful assessment of DL models performance and consideration in the choice of tool, and by following reporting guidelines to ensure transparency

Summary

Trained model

That are often difficult to detect without detailed analysis of the network output. For instance, unsuitable segmentation models will lead to under- and over-segmentation results, while inappropriate image denoising and restoration models may lead to poor performance, image degradation and hallucinations (Fig. 2; see refs. 29,30 for reviews on this topic). Large object detection or segmentation datasets are very time-consuming to create as they require experts to annotate hundreds to thousands of images manually, with three-dimensional datasets being difficult to generate The curation of such datasets would be greatly facilitated by creating a centralized repository where training datasets generated to analyze microscopy data using DL would be available (for example, as done by the Papers with Code initiative; https://paperswithcode.com/datasets). This would help to produce and disseminate benchmark datasets[14,33] that would be accessible for both algorithm developers and life scientists. Classical algorithm (PureDenoise) Model trained on SiR-DNA images mSSIM: 0.83 NRMSE: 0.13 PSNR: 31.7

Cellpose cyto model

Assessing DL model predictions

Cellpose nuclei

Recall Recall

Choosing a DL tool

Concluding remarks