Abstract

BackgroundHybridization of heterologous (non-specific) nucleic acids onto arrays designed for model-organisms has been proposed as a viable genomic resource for estimating sequence variation and gene expression in non-model organisms. However, conventional methods of normalization that assume equivalent distributions (such as quantile normalization) are inappropriate when applied to non-specific (heterologous) hybridization. We propose an algorithm for normalizing and centering intensity data from heterologous hybridization that makes no prior assumptions of distribution, reduces the false appearance of homology, and provides a way for researchers to confirm whether heterologous hybridization is suitable.ResultsData are normalized by adjusting for Gibbs free energy binding, and centered by adjusting for the median of a common set of control probes assumed to be equivalently dissimilar for all species. This procedure was compared to existing approaches and found to be as successful as Loess normalization at detecting sequence variations (deletions) and even more successful than quantile normalization at reducing the accumulation of false positive probe matches between two related nematode species, Caenorhabditis elegans and C. briggsae. Despite the improvements, we still found that probe fluorescence intensity was too poorly correlated with sequence similarity to result in reliable detection of matching probe sequence.ConclusionsCross-species hybridizations can be a way to adapt genome-enabled tools for closely related non-model organisms, but data must be appropriately normalized and centered in a way that accommodates hybridization of nucleic acids with diverged sequence. For short, 25-mer probes, hybridization intensity alone may be insufficiently correlated with sequence similarity to allow reliable inference of homology at the probe level.

Highlights

  • Hybridization of heterologous nucleic acids onto arrays designed for model-organisms has been proposed as a viable genomic resource for estimating sequence variation and gene expression in nonmodel organisms

  • We hypothesized that quantile normalization of cross-species hybridization has the potential to result in the appearance of reliable data, but may be a misleading representation of false positive probe matches due to artifacts during the data transformation process

  • The major difference in our approach is that Machado and Renn normalize based on the 100 or 1000 most conserved genes, while we propose normalizing and centering based on control, non-target probes

Read more

Summary

Introduction

Hybridization of heterologous (non-specific) nucleic acids onto arrays designed for model-organisms has been proposed as a viable genomic resource for estimating sequence variation and gene expression in nonmodel organisms. Hybridization of nucleic acids from non-model organisms onto DNA microarrays designed for closely related modelorganisms has been used as a potential alternative to building genomic resources for each species of interest. It is common to screen probes for sequence conservation by first hybridizing heterologous gDNA, and secondly assessing gene expression by hybridizing experimental cDNA and analyzing only the accepted probes [5] This strategy has been applied to examine gene expression of various genera of Brassicaceae on an array containing Arabidopsis thaliana probes, [5,6,7], expression of banana genes on a rice array [8], expression of horse genes on an array containing human probes [9], and expression of goat genes using a bovine array [10]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call