Abstract

experiments because gene-gene interactions can naturally occur. In this paper, we use an effective column size idea to take correlations among genes into account to modify the classical F test. We consider various magnitudes of correlation among genes in Monte Carlo simulation studies. We compare the proposed test (F -MOD) with the classical F test and multivariate Hotelling's T 2 test through validity and power analyses. We also demonstrate the proposed test with real type 2 diabetes mellitus gene expression data, which was obtained from the Gene Expression Omnibus (GEO) database with accession number GSE25724. Abstract Completion of the human genome sequence allows researchers to study expression of 20,000-30,000 genes in a single assay. There are three types of platforms: short oligonucleotide (25-30 base), long oligonucleotide (50-80 base), and cDNA. However, the most two common platform are based on collections of cDNA clones (1) or short (25 base) oligonucleotides synthesized in situ by photolithographic methods (2). Although microarrays are the most extensively used technology for studying gene expression, it has a high dimensional data structure that makes statistical inference from this type of data challenging (3). Several methods such as clustering and classification have been used to identify groups of genes that share similar functions (4,5). However, while clustering and classification are useful techniques to search for similar genes, these techniques do not answer the question of which genes are differentially expressed under different conditions (e.g. cancer cells versus normal cells). The answer to the question requires hypothesis testing with null hypothesis of no difference in the means of gene expressions under different conditions. Various statistical tests have been proposed involving fold change, linear models, as well as Bayesian methods (6-8); however, progress has been slow in adopting these methods in microarray analysis. Moreover, all of these methods have the common characteristic of being univariate methods. 2 test; Microarray

Highlights

  • Completion of the human genome sequence allows researchers to study expression of 20,000-30,000 genes in a single assay

  • Microarray data has a high dimensional data structure that makes statistical inference drawn from this type of data challenging

  • Within each group, genes are either positively or negatively correlated, and due to their relative distance in the regulatory pathway, the further apart two genes, the less correlation between them. These are exactly the reasons why we considered the structures of Σ1 and Σ2 defined in (12) for microarray data

Read more

Summary

Introduction

Completion of the human genome sequence allows researchers to study expression of 20,000-30,000 genes in a single assay. Microarrays are the most extensively used technology for studying gene expression, it has a high dimensional data structure that makes statistical inference from this type of data challenging [3]. Several methods such as clustering and classification have been used to identify groups of genes that share similar functions [4,5]. Various statistical tests have been proposed involving fold change, linear models, as well as Bayesian methods [6,7,8]; progress has been slow in adopting these methods in microarray analysis All of these methods have the common characteristic of being univariate methods

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.