Abstract

BackgroundAutomated, image based high-content screening is a fundamental tool for discovery in biological science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this paper we investigated an efficient method for the extraction of quantitative features from images by combining second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), which has been extensively used for various image processing applications. The curvelet transform has a more sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges and curves. A combined feature description from Haralick feature and curvelet transform can further increase the accuracy of classification by taking their complementary information. We then investigate the applicability of the random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting.ResultsExperimental results on the phenotype recognition from three benchmarking image sets including HeLa, CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any individual one in the classification accuracy. The ensemble model produces better classification performance compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble parameters and found that satisfactory performance improvement could be brought by a relative medium dimensionality of feature subsets and small ensemble size.ConclusionsThe characteristics of curvelet transform of being multiscale and multidirectional suit the description of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a number of commonly applied multi-class classifiers in the investigated application of phenotype recognition.

Highlights

  • Automated, image based high-content screening is a fundamental tool for discovery in biological science

  • With interdisciplinary efforts from computer science and biology, scientists are able to carry out large-scale screening of cellular phenotypes, at wholecell or sub-cellular levels, which are important in many applications, e.g., delineating cellular pathways, drug target validation and even cancer diagnosis [1,2]

  • We propose to construct and evaluate a Random Subspace classifier ensemble with multiple layer perceptron as the base classifier, using the combined features from Curvelet Transform and Haralick features

Read more

Summary

Introduction

Image based high-content screening is a fundamental tool for discovery in biological science. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting Complex cellular structures such as molecular construction of a cell can be studied by fluorescence microscopy images of cells with appropriate stains. Robotic systems nowadays can automatically acquire thousands of images from cell assays, which are often referred as being “highcontent” for the large amount of information. These images reflect the biological properties of the cell with many features, including size, shape, amount of fluorescent label, DNA content, cell cycle, and cell morphology. Genome-wide screens, produce huge volumes of image data which is beyond human’s capability of manual analysis, and automating the analysis of the large number of images generated in such screening is the bottleneck in realizing the full potential of cellular and molecular imaging studies

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call