Abstract In cancer genomics studies, variation in sample purity can greatly influence the ability to detect cancer-specific genomic aberrations. Genomics datasets are often analyzed as if samples consist of 100% cancerous or benign cells, which is rarely true in practice. Particularly in prostate tumors, one always expects tumor samples to consist of a mixture of cancer cells, stromal cells, and benign cells. Here we use next-generation exome sequencing data to estimate the proportion of cancer cells by sample, which may then be used for downstream analysis. Using human whole-exome capture data from 14 prostate tumors and matched benign tissues from the same patients, we developed a statistical method for estimating tumor content. Specifically, we generated a list of single-nucleotide variant (SNV) candidates derived from the sequencing data and used these candidates to fit a two-component binomial mixture model. The two components are assumed to consist of a set of experimental artifacts such as sequencing errors which tend to exhibit low fractions of variant reads, and a set of true SNVs whose variant fractions are related to the unknown tumor content. Estimation, achieved via the EM algorithm, results in a probabilistic classification of the SNV candidates as well as an estimated proportion of cancer cells in each sample. As expected, 6 of 7 metastatic samples had high estimated tumor content (>70%). In contrast, among the set of 7 localized cancer samples, most of which exhibited an absence of copy number aberrations by aCGH, only 3 had enough SNVs to reliably estimate tumor content. Tumor content in these samples varied: two of the three samples had tumor content of approximately 70% while the third was estimated to be 35%. Knowledge of a sample's tumor content may be used in any downstream analysis of genomic data; here we point out three applications. First, this method allows for rigorous quality control and exclusion of samples based on tumor content as estimated from the data. Second, it can improve the quality of SNV calling from next-generation sequencing data. To test this, we performed Sanger sequencing on 30 SNV candidates from one of our localized cancer samples and compared validation status with the predictions from our model. Notably, our model predicted validation status perfectly (18 validated; 12 did not) on this set of candidates. Third, precise knowledge of tumor content enabled us to explain variation in copy number profiles by aCGH. We found that log-ratios for regions of gain and loss were smaller in magnitude for samples with lower tumor content; this has major implications for calling aberrant regions from aCGH data. Thus, we anticipate that the methodology described here will be useful in refining many standard methods of analysis so that a clearer picture of aberrations in prostate cancer can emerge. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 102nd Annual Meeting of the American Association for Cancer Research; 2011 Apr 2-6; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2011;71(8 Suppl):Abstract nr LB-262. doi:10.1158/1538-7445.AM2011-LB-262