Abstract

Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.

Highlights

  • Modern microscopy methods enable researchers to capture images that describe cellular and molecular features in biological samples at an unprecedented scale

  • In order to evaluate the impact of deep learning (DL) on bioimage analysis results, we instantiated three exemplary DL-based strategies (Figure 1; strategies color-coded in gray, blue, and orange) and investigate them in terms of objectivity, reliability, and validity of fluorescent feature annotation

  • The present study contributes to bridging the gap between ‘methods’ and ‘biology’ oriented studies in image feature analysis (Meijering et al, 2016)

Read more

Summary

Introduction

Modern microscopy methods enable researchers to capture images that describe cellular and molecular features in biological samples at an unprecedented scale. The fluorescent images were interpreted manually by a group of human experts Their results were used to train a large variety of deep learning models. Griebel et al concluded that combining the expert knowledge of multiple experts reduces the subjectivity of bioimage annotation by deep learning algorithms Combining such consensus information in a group of deep learning models improves the quality of bioimage analysis, so that the results are reliable, transparent and less subjective. The present study asks whether DL, if instantiated in an appropriate manner, holds the potential to instead enhance the objectivity, reproducibility and validity of bioimage analysis To tackle this conundrum, we investigated different DL-based strategies on five fluorescence image datasets. We demonstrate that ensembles of consensus models are even capable of enhancing the reliability and validity of bioimage analysis of ambiguous image data, such as fluorescence features in histological tissue sections

Results
Discussion
Limitations
Materials and methods
Evaluation metrics
7.10.5 Transfer learning
7.12.1 Statistical analysis of fluorescent feature quantifications
7.12.2 Effect size calculation
Funding Funder
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call