Abstract

BackgroundStaining the mRNA of a gene via in situ hybridization (ISH) during the development of a D. melanogaster embryo delivers the detailed spatio-temporal pattern of expression of the gene. Many biological problems such as the detection of co-expressed genes, co-regulated genes, and transcription factor binding motifs rely heavily on the analyses of these image patterns. The increasing availability of ISH image data motivates the development of automated computational approaches to the analysis of gene expression patterns.ResultsWe have developed algorithms and associated software that extracts a feature representation of a gene expression pattern from an ISH image, that clusters genes sharing the same spatio-temporal pattern of expression, that suggests transcription factor binding (TFB) site motifs for genes that appear to be co-regulated (based on the clustering), and that automatically identifies the anatomical regions that express a gene given a training set of annotations. In fact, we developed three different feature representations, based on Gaussian Mixture Models (GMM), Principal Component Analysis (PCA), and wavelet functions, each having different merits with respect to the tasks above. For clustering image patterns, we developed a minimum spanning tree method (MSTCUT), and for proposing TFB sites we used standard motif finders on clustered/co-expressed genes with the added twist of requiring conservation across the genomes of 8 related fly species. Lastly, we trained a suite of binary-classifiers, one for each anatomical annotation term in a controlled vocabulary or ontology that operate on the wavelet feature representation. We report the results of applying these methods to the Berkeley Drosophila Genome Project (BDGP) gene expression database.ConclusionOur automatic image analysis methods recapitulate known co-regulated genes and give correct developmental-stage classifications with 99+% accuracy, despite variations in morphology, orientation, and focal plane suggesting that these techniques form a set of useful tools for the large-scale computational analysis of fly embryonic gene expression patterns.

Highlights

  • Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a D. melanogaster embryo delivers the detailed spatio-temporal pattern of expression of the gene

  • This set of regions is regarded as the gene expression pattern for this image and it is further partitioned using a 2D Gaussian Mixture Models (GMM) decomposition to obtain the set of local GMM blobs

  • This study has focused on the 2D in situ hybridization patterns within images generated by the Berkeley Drosophila Genome Project (BDGP)

Read more

Summary

Introduction

Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a D. melanogaster embryo delivers the detailed spatio-temporal pattern of expression of the gene Many biological problems such as the detection of co-expressed genes, co-regulated genes, and transcription factor binding motifs rely heavily on the analyses of these image patterns. A large body of work analyzing DNA micro-array data from microorganisms has demonstrated the value of gene expression analysis in understanding gene function and dissecting gene regulation [1,2,3] While these micro-array analyses can be extended to multi-cellular organisms by serially analyzing different cell-types and tissues, such work misses the complexity of expression patterns and relationship between patterns. Automatic analyses would make the process more rapid and consistent, and may identify biologically significant features missed during manual curation

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.