The aim of this paper is to discuss the effect of missing values in detecting differentially expressed genes in a cDNA microarray experiment in the context of a one sample problem. We conducted a cDNA microarray experiment to detect differentially expressed genes for the metastasis of colorectal cancer based on twenty patients who underwent liver resection due to liver metastasis from colorectal cancer. Total RNAs from metastatic liver tumor and adjacent normal liver tissue from a single patient were labeled with cy5 and cy3, respectively, and competitively hybridized to a cDNA microarray with 7775 human genes. We used M=log2(R/G) for the signal evaluation, where R and G denoted the fluorescent intensities of Cy5 and Cy3 dyes, respectively. The statistical problem comprises a one sample test of testing E(M)=0 for each gene and involves multiple tests. The twenty cDNA microarray data would comprise a matrix of dimension 7775 by 20, if there were no missing values. However, missing values occur for various reasons. For each gene, the no missing proportion (NMP) was defined to be the proportion of non-missing values out of twenty. In detecting differentially expressed (DE) genes, we used the genes whose NMP is greater than or equal to 0.4 and then sequentially increased NMP by 0.1 for investigating its effect on the detection of DE genes. For each fixed NMP, we imputed the missing values with K-nearest neighbor method (K=10) and applied the nonparametric t-test of Dudoit et al. (2002), SAM by Tusher et al. (2001) and empirical Bayes procedure by Lonnstedt and Speed (2002) to find out the effect of missing values in the final outcome. These three procedures yielded substantially agreeable result in detecting DE genes. Of these three procedures we used SAM for exploring the acceptable NMP level. The result showed that the optimum no missing proportion (NMP) found in this data set turned out to be 80%. It is more desirable to find the optimum level of NMP for each data set by applying the method described in this note, when the plot of (NMP, Number of overlapping genes) shows a turning point. Corresponding author: Byung Soo Kim (Tel: +82-22123-4541, Fax:+82-2-313-5331, Email: bskim@yonsei.ac.kr) B.S. Kim’s study was supported by Yonsei University Research Fund of 2001. S.Y. Rha’s study was supported by a grant of the IMT-2000 project, Ministry of Health & Welfare, Republic of Korea (01-PJ11-PG9-01BT00A-0028). Introduction The DNA microarray has been established as a major tool in biological researches due to its ability of monitoring gene expression levels of thousands of genes simultaneously under different conditions (Jin et al., 2001: Gibson 2002; Hedenfalk, 2002; Olesiak 2002; Ramaswamy 2002; Huang 2003; Keshave and Ong, 2003). It is not trivial to analyze the data from microarray experiment, not because they just involve large amount of data, but because they comprise a non-standard statistical problem which is often referred to as a “large p, small n” problem (West, 2003). Typically, we have thousands of genes (=p) for a microarray experiment with tens of microarrays (=n). Several analysis tools including SAM (Tusher et al, 2001) and BRB-ArrayTools (Simon and Peng) have been introduced in the pubic domain to provide a guidance to laboratory scientists on the statistical analysis of microarray data. Microarray experiment data can be represented by a p x n matrix, where the (i, j)th element of the matrix indicates the i-th gene expression level for the j-th microarray, i=1, .., p, and j=1, ,n. It is quite often that we observe the missing values in the data of p x n matrix. Missing values occur for various reasons not only from the technical problems but also from the biological characteristics. Currently, as the chip quality and the hybridization techniques have been improved to a certain level, the missing values usually come from the biological reasons, such as no expression of the specific genes in the sample or the