Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens

Zheng Yin,Xiaobo Zhou,Stephen Tc Wong,Chris Bakal,Youxian Sun,Fuhai Li,Fuhai Li,Chris Bakal,Norbert Perrimon,Fuhai Li,Norbert Perrimon,Xiaobo Zhou,Fuhai Li,Stephen Tc Wong

doi:10.1186/1471-2105-9-264

Abstract

BackgroundThe recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.ResultsHere we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.ConclusionWe demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.

Highlights

The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data
We demonstrate that our method can detect various novel phenotypes effectively in complex datasets
We propose to tackle this problem by using Gaussian Mixture Model (GMM) as reference distributions for existing phenotypes in gap statistics and validate our method using simulation

Summary

Introduction

The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. In order to facilitate similar analysis of image-based screens, we and other researchers have recently developed novel image segmentation algorithms to rapidly quantitate hundreds of different parameters at a single-cell level in an automated fashion [3,4,5,6], and we have demonstrated that such image segmentation algorithms can be used in the context of genetic screens [7]. This and other similar screens [8] have been 50–100 fold smaller in scale than typical low-dimensional screens and are not yet genomescale. Automated feature space reduction schemes have been implemented in the context of high content screen, including feature extraction methods examined in [9], factor analysis in http://www.biomedcentral.com/1471-2105/9/264

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 5, 2008
Citations: 64	License type: cc-by

R Discovery Prime

R Discovery Prime

Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Genome-wide RNAi Screening to Identify Host Factors That Modulate Oncolytic Virus Therapy.
Charles A Lefebvre ... Stephen D Baird
Journal of Visualized Experiments | VOL. -
Charles A Lefebvre, et. al.Charles A Lefebvre ... Stephen D Baird
03 Apr 2018
Journal of Visualized Experiments | VOL. -

Abstract P3-10-16: Identification of molecules that enhance the efficacy of eribulin in TNBC and IBC cell lines
Jangsoon Lee ... Debu Tripathy
Cancer Research | VOL. 80
Jangsoon Lee, et. al.Jangsoon Lee ... Debu Tripathy
14 Feb 2020
Abstract P3-10-16: Identification of molecules that enhance the efficacy of eribulin in TNBC and IBC cell lines
Jangsoon Lee ... Debu Tripathy

IScreen: Image-Based High-Content RNAi Screening Analysis Tools
Rui Zhong ... Guanghua Xiao
SLAS Discovery | VOL. 20
Rui Zhong, et. al.Rui Zhong ... Guanghua Xiao
01 Sep 2015
SLAS Discovery | VOL. 20

RNA Interference Screen to Identify Pathways That Enhance or Reduce Nonviral Gene Transfer During Lipofection
Gregory A Barker ... Scott L Diamond
Molecular Therapy | VOL. 16
Gregory A Barker, et. al.Gregory A Barker ... Scott L Diamond
01 Sep 2008
Molecular Therapy | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics