Abstract

Abstract Transcriptional output of human genome is far more complex than predicted by the current set of protein-coding annotations and most of the novel RNAs being produced appear to not encode proteins. This has transformed our understanding of genome complexity in recent years and suggested new paradigms of genome regulation. However, the fraction of the genome that is utilized to produce cellular RNA whose function we do not understand and even more so, their relative mass in a cell remains a controversial issue. RNA from normal human liver and brain, the K562 leukemia cell line and 6 paired Ewing primary and metastatic tumors was converted into cDNA using random hexamers and sequenced using single-molecule sequencing (SMS). No amplification, ligation, or size selection were used thus minimizing methodological biases. PolyA+ RNA, total RNA, and total RNA depleted of ribosomal RNA were studied. The SMS reads were aligned to the complete human genome and uniquely mapping reads from human tissue sources were further filtered to exclude sequences aligning to rDNA sequences, the mitochondrial genome, as well as to genomic repeats annotated by the RepeatMasker program as rRNA. After filtering, the remaining informative reads were used for subsequent analyses, including comparison to known annotations defined by the exons of UCSC Genes. This investigation makes the following key observations. 1. We show clearly that the so-called “dark matter RNAs”, which represent mostly non-coding RNA, not only exist in human cells but can comprise the majority of total non-ribosomal, non-mitochondrial RNA. In fact, we estimate that half to two-thirds of all such RNAs in a human cell is non-coding. 2. It shows a significant loss of this complexity if only polyA+ RNA is profiled. In this respect, most, if not all, contemporary RNA-seq papers continue to focus on this type of RNA and thus report significantly skewed results in terms of the true complexity of human RNA. 3. We show the presence of a large number of very long (100's of kbs), abundant intergenic transcribed regions located in areas of the genome that are devoid of protein-coding annotations. We show evidence that these very long and likely non-coding RNA transcripts are expressed during normal development, silenced in adult tissues and are then re-activated during cancer progression. Our understanding of the repertoire of human RNAs remains far from complete, and almost all RNA-Seq studies have missed this complexity due to the limited view obtained when using only the polyA+ RNA fraction. Moreover, many novel genomic regions give rise to RNAs differentially expressed in different tumor types and also in primary vs metastatic tumors derived from the same patient. This brings a tantalizing possibility that a great number of hitherto uncharacterized RNAs are involved in tumoregenesis and they could be used as both diagnostic and potentially therapeutic targets. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 102nd Annual Meeting of the American Association for Cancer Research; 2011 Apr 2-6; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2011;71(8 Suppl):Abstract nr 1177. doi:10.1158/1538-7445.AM2011-1177

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call