Abstract

The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.

Highlights

  • Transcription is a mechanism of information transmission encoded in protein-coding genes conducted by RNA Polymerase II, resulting in the production of messenger RNAs

  • The existing annotations for sequenced conifer genomes allow for computational prediction of biologically relevant elements, such as transcription start sites (TSSs) and transcription factor binding sites (TFBSs), and meaningful comparative analysis

  • The predicted TSSs and their putative promoter regions provide the basis for future experimental verification and present a valuable resource for better understanding gene regulation and investigating the evolutionary relationships between gymnosperm and angiosperm clades

Read more

Summary

Introduction

Transcription is a mechanism of information transmission encoded in protein-coding genes conducted by RNA Polymerase II, resulting in the production of messenger RNAs (mRNAs) This process is subject to complex regulation via binding of transcription factors (TFs) to appropriate genomic sites consisting of regulatory nucleotide motifs typically located within the 1000 bp region upstream of the transcription start sites (TSSs). The best-known regulatory motif in core promoter regions is the TATA-box, a recognition site for the TATA-binding protein (TBP) This motif has a highly conserved consensus sequence TATA(A/T)A(A/T) found in 5–60% of all RNA Pol II promoters [1,4,5,6,7,8]. Other common core promoter elements are the TFIIB recognition element (BREu, consensus G/CG/CG/ACGCC, and BREd, consensus G/ATT/AT/GT/GT/GT/G [11,12]), the downstream promoter element (DPE, consensus RGWYV [13,14]), and the downstream core element (DCE, consensus CTTC, CTGT, AGC [15])

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.