Abstract

To identify and characterize transcript structures ranging from transcriptional start sites (TSSs) to poly(A)-addition sites (PASs), we constructed and analyzed human TSS/PAS mate pair full-length cDNA libraries from 14 tissue types and four cell lines. The collected information enabled us to define TSS cluster (TSC) and PAS cluster (PAC) relationships for a total of 8530/9400 RefSeq genes, as well as 4251/5618 of their putative alternative promoters/terminators and 4619/4605 intervening transcripts, respectively. Analyses of the putative alternative TSCs and alternative PACs revealed that their selection appeared to be mostly independent, with rare exceptions. In those exceptional cases, pairs of transcript units rarely overlapped one another and were occasionally separated by Rad21/CTCF. We also identified a total of 172 similar cases in which TSCs and PACs spanned adjacent but distinct genes. In these cases, different transcripts may utilize different functional units of a particular gene or of adjacent genes. This approach was also useful for identifying fusion gene transcripts in cancerous cells. Furthermore, we could construct cDNA libraries in which 3′-end mate pairs were distributed randomly over the transcripts. These libraries were useful for assembling the internal structure of previously uncharacterized alternative promoter products, as well as intervening transcripts.

Highlights

  • To define gene regions in the genome and to identify the exact structures of their encoding transcripts, it is essential to know the exact transcriptional start site (TSS) and poly(A)-addition site (PAS)

  • When we compared the distributions of the Z-scores of TSS cluster (TSC) and PAS cluster (PAC), we found that the tissue biases in the occurrences were more significant for TSCs (P = 1E−62; Figure 4D)

  • The numbers of genes associated with the indicated Gene Ontology (GO) terms in the genes with ‘preferred’ TSC-PACs and in the total population are shown in the third and fourth columns, respectively

Read more

Summary

Introduction

To define gene regions in the genome and to identify the exact structures of their encoding transcripts, it is essential to know the exact transcriptional start site (TSS) and poly(A)-addition site (PAS). The term gene itself and the modular architecture of genes and genomes could be defined by TSSs and PASs [1]. Accurate positional information on TSSs has been collected in a genomewide manner by intensive analyses of the so-called fulllength complementary deoxyribonucleic acids (cDNAs) using cap structure trapping technologies, such as oligo capping [3,4] and cap analysis of gene expression [5,6,7]. Information on PASs has been accumulated mainly using the 3 -end information of expressed sequence tags (ESTs) [8], followed recently by intensive RNA Seq analysis [9,10]. The so-called PA Seq method has been developed to detect PAS sites [11]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.