Abstract

A correct genome annotation is fundamental for research in the field of molecular and structural biology. The annotation of the reference genome of Chaetomium thermophilum has been reported previously, but it is essentially limited to open reading frames (ORFs) of protein coding genes and contains only a few noncoding transcripts. In this study, we identified and annotated full-length transcripts of C. thermophilum by deep RNA sequencing. We annotated 7044 coding genes and 4567 noncoding genes. Astonishingly, 23% of the coding genes are alternatively spliced. We identified 679 novel coding genes as well as 2878 novel noncoding genes and corrected the structural organization of more than 50% of the previously annotated genes. Furthermore, we substantially extended the Gene Ontology (GO) and Enzyme Commission (EC) lists, which provide comprehensive search tools for potential industrial applications and basic research. The identified novel transcripts and improved annotation will help to understand the gene regulatory landscape in C. thermophilum. The analysis pipeline developed here can be used to build transcriptome assemblies and identify coding and noncoding RNAs of other species.

Highlights

  • Chaetomium thermophilum is a thermophilic filamentous ascomycete, with the ability to grow at 50–52 ◦C

  • We identified 679 novel coding genes represented by 892 transcripts and isoforms, as well as 2878 novel noncoding genes represented by 3639 transcripts and isoforms

  • C. thermophilum belongs to the group of filamentous fungi that are an economically important, as it developed into a relevant resource in pharmaceutical and food processing industries, as well as second generation biofuel production, and became a scientifically important model organism in basic research

Read more

Summary

Introduction

Chaetomium thermophilum is a thermophilic filamentous ascomycete, with the ability to grow at 50–52 ◦C. Owing to the thermostability of its proteins, the structures of many C. thermophilum proteins and protein assemblies have been solved with high resolution in various crystallization and cryo-electron microscopy studies, which improved our understanding of the structural organization and function of higher order protein complexes. These include the Crm export factor, the splicing factor Cwc, mRNA export factor Mex67-Mtr, the FACT complex, the eukaryotic RAC chaperone, the nuclear pore Nsp1-channel complex, and the 90 S pre-ribosomal complex [15,16,17,18,19,20,21,22]. We present an extended Gene Ontology (GO) and Enzyme Commission (EC) numbers associated with the protein-coding genes of C. thermophilum

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call