Abstract Long intervening non-coding RNAs (lincRNAs) have been implicated in diverse biological processes including p53 signaling and chromatin remodeling, but have not been thoroughly profiled in human cancers. Here, we have developed an approach for ab initio reconstruction of poly-A+ transcriptome sequencing (RNA-seq) data for the unbiased discovery of novel transcripts. To accomplish this, we developed AssemblyLine, a method that clusters and filters large collections of transcripts to produce a consensus transcriptome. To demonstrate AssemblyLine, we sequenced a cohort comprised of 81 prostatic tissues (20 benign, 47 localized tumors, and 14 metastases) and 21 prostatic cell lines using the Illumina Genome Analyzer II and generated 1.723 billion sequence fragments. We successfully aligned 1.42 billion reads with Tophat – a program capable of ab initio splice junction discovery – and then used Cufflinks to model sample-specific transcriptomes totaling 8.25 million transcripts. AssemblyLine condensed the 8.25 million original transcripts into 35,415 distinct transcriptional loci, of which 1,859 (5.2%) represented candidate lincRNAs that lacked genomic overlap with known gene annotations. These putative RNAs lacked robust open reading frames suggesting that the vast majority were non-coding. Further, they exhibited evolutionary conservation and were enriched with histone modifications supporting independent transcriptional start sites and active transcription. Together, these results add confidence to AssemblyLine's nomination process and suggest that these novel lincRNAs may be transcriptionally active in prostate cancer. We then selected 106 transcripts that were differentially expressed in localized prostate cancer when compared to benign adjacent tissue (False Discovery Rate < 0.05), and 15 transcripts with profound cancer outlier expression profiles for further study. These 121 Prostate Cancer Associated Transcripts (PCATs) accurately classified benign, localized, and metastatic prostate cancer tissues by unsupervised hierarchical clustering. Consistent with AssemblyLine's nominations, PCR-based experiments on selected transcripts in an independent tissue cohort showed high validation rates for the transcript structure and expression level predictions. Furthermore, in vitro studies of PCAT-1, a novel lincRNA observed in our dataset as highly upregulated in prostate cancer, revealed direct regulation by the histone methyltransferase EZH2. siRNA knockdown of PCAT-1 in LNCaP, a prostate cancer cell line, caused a 25-50% decrease in cell proliferation. Thus, this study establishes a paradigm for ab initio transcriptome annotation and discovery of novel lincRNAs in cancer tissues. Further, these results provide intriguing evidence that lincRNAs are aberrantly expressed and may play a role in prostate cancer progression. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 102nd Annual Meeting of the American Association for Cancer Research; 2011 Apr 2-6; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2011;71(8 Suppl):Abstract nr 929. doi:10.1158/1538-7445.AM2011-929
Read full abstract