Abstract

BackgroundNormalization of gene expression data has been studied for many years and various strategies have been formulated to deal with various types of data. Most normalization algorithms rely on the assumption that the number of up-regulated genes and the number of down-regulated genes are roughly the same. However, the well-known Golden Spike experiment presents a unique situation in which differentially regulated genes are biased toward one direction, thereby challenging the conclusions of previous bench mark studies.ResultsThis study proposes two novel approaches, KDL and KDQ, based on kernel density estimation to improve upon the basic idea of invariant set selection. The key concept is to provide various importance scores to data points on the MA plot according to their proximity to the cluster of the null genes under the assumption that null genes are more densely distributed than those that are differentially regulated. The comparison is demonstrated in the Golden Spike experiment as well as with simulation data using the ROC curves and compression rates. KDL and KDQ in combination with GCRMA provided the best performance among all approaches.ConclusionsThis study determined that methods based on invariant sets are better able to resolve the problem of asymmetry. Normalization, either before or after expression summary for probesets, improves performance to a similar degree.

Highlights

  • Normalization of gene expression data has been studied for many years and various strategies have been formulated to deal with various types of data

  • This study proposes a Kernel Density weighted Loess (KDL) method, adopting a concept similar to the detection of invariant sets through the estimation of density

  • 2.4 Simulation We have demonstrated that KDL and kernel density based quantile (KDQ) are both able to improve the bias caused by asymmetry between the number of up- and down-regulated genes

Read more

Summary

Introduction

Normalization of gene expression data has been studied for many years and various strategies have been formulated to deal with various types of data. When arrays are applied to strains that are inconsistent with the strain used to design the array, the probe intensities are all relatively lower than the standard strain at polymorphic sites Regardless of whether this is due to true differences in expression or hybridization strength, genome-wide expression distribution will be biased if there are a moderate number of polymorphic sites. The other situation requiring attention is small arrays tailored to specific applications [9]. This violates the common assumption that most genes are not differentially expressed across samples. All of the genes in small scale arrays are crucial to specific purposes; they are likely to change at different scales and might change in the same direction

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.