Abstract
BackgroundThe main goal in analyzing microarray data is to determine the genes that are differentially expressed across two types of tissue samples or samples obtained under two experimental conditions. Mixture model method (MMM hereafter) is a nonparametric statistical method often used for microarray processing applications, but is known to over-fit the data if the number of replicates is small. In addition, the results of the MMM may not be repeatable when dealing with a small number of replicates. In this paper, we propose a new version of MMM to ensure the repeatability of the results in different runs, and reduce the sensitivity of the results on the parameters.ResultsThe proposed technique is applied to the two different data sets: Leukaemia data set and a data set that examines the effects of low phosphate diet on regular and Hyp mice. In each study, the proposed algorithm successfully selects genes closely related to the disease state that are verified by biological information.ConclusionThe results indicate 100% repeatability in all runs, and exhibit very little sensitivity on the choice of parameters. In addition, the evaluation of the applied method on the Leukaemia data set shows 12% improvement compared to the MMM in detecting the biologically-identified 50 expressed genes by Thomas et al. The results witness to the successful performance of the proposed algorithm in quantitative pathogenesis of diseases and comparative evaluation of treatment methods.
Highlights
The main goal in analyzing microarray data is to determine the genes that are differentially expressed across two types of tissue samples or samples obtained under two experimental conditions
The nonparametric statistical methods, including the Empirical Bayes (EB) method [3], the significance analysis specialized for microarray data and the mixture model method (MMM) [5] have been specialized and applied for microarray data analysis
Rank based on MMM Rank based on K5M
Summary
The main goal in analyzing microarray data is to determine the genes that are differentially expressed across two types of tissue samples or samples obtained under two experimental conditions. Mixture model method (MMM hereafter) is a nonparametric statistical method often used for microarray processing applications, but is known to over-fit the data if the number of replicates is small. The large data sets produced by microarray technology have resulted in the need for reliable, accurate, and robust methods for microarray data analysis. While various parametric methods and tests such as two-sample t-test [2] have been applied for microarray data analysis, strong parametric assumptions made in these methods as well as their strong dependency on large sample sets restrict the reliability of such techniques in microarray problems. Without ignoring the potential applicability of non-parametric methods in microarray processing applications, due to the claims made in [6], we have restricted the comparison of our methods only to the MMM based methods
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.