Abstract

Pinpointing subcellular protein localizations from microscopy images is easy to the trained eye, but challenging to automate. Based on the Human Protein Atlas image collection, we held a competition to identify deep learning solutions to solve this task. Challenges included training on highly imbalanced classes and predicting multiple labels per image. Over 3 months, 2,172 teams participated. Despite convergence on popular networks and training techniques, there was considerable variety among the solutions. Participants applied strategies for modifying neural networks and loss functions, augmenting data and using pretrained networks. The winning models far outperformed our previous effort at multi-label classification of protein localization patterns by ~20%. These models can be used as classifiers to annotate new images, feature extractors to measure pattern similarity or pretrained networks for a wide range of biological applications.

Highlights

  • Advancement in high-throughput microscopy has propelled the generation of massive amounts of biological imaging data[1]

  • Unbiased analysis of subcellular protein localizations from our images has greatly enriched our vocabulary for describing cellular systems

  • This analysis was first performed manually[3], and we have since integrated the labor-intensive annotation tasks into a mainstream video game[5], which produced tens of millions of human annotations. These annotations were successful at the challenging task of identifying mixed patterns of protein localizations, a task called multi-label classification[6]

Read more

Summary

Introduction

Advancement in high-throughput microscopy has propelled the generation of massive amounts of biological imaging data[1]. Unbiased analysis of subcellular protein localizations from our images has greatly enriched our vocabulary for describing cellular systems This analysis was first performed manually[3], and we have since integrated the labor-intensive annotation tasks into a mainstream video game[5], which produced tens of millions of human annotations. These annotations were successful at the challenging task of identifying mixed patterns of protein localizations, a task called multi-label classification[6]. Compared to Loc-CAT5, which uses hand-crafted features as inputs, CNNs typically take raw images as inputs and learn hierarchical feature representations in an end-to-end fashion This allows the model to better abstract cellular localization patterns and scale efficiently with data size[14]. Finding the best solution for classifying protein localizations within HPA Cell Atlas Images involves performing searches of Nature Methods | VOL 16 | December 2019 | 1254–1261 | www.nature.com/naturemethods

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call