Abstract

BackgroundThe 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage.ResultsWe generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource.ConclusionsWe have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3′ and 5′ UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB (http://paramecium.i2bc.paris-saclay.fr). TrUC software is freely distributed under a GNU GPL v3 licence (https://github.com/oarnaiz/TrUC).

Highlights

  • The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago

  • The germline and somatic functions of chromosomes are respectively ensured by a germline micronucleus (MIC) which undergoes meiosis and fertilization and a somatic macronucleus (MAC) that contains a version of the germline genome stripped of parasitic sequences and optimized for gene expression

  • Genetics of the ciliate Paramecium was pioneered nearly a century ago [2] and this complex unicellular eukaryote has since served as model for a variety of biological processes commonly found in animals, from excitable membranes and swimming behavior [3] to programmed DNA elimination and its epigenetic control during somatic differentiation [4]

Read more

Summary

Introduction

The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. Genetics of the ciliate Paramecium was pioneered nearly a century ago [2] and this complex unicellular eukaryote has since served as model for a variety of biological processes commonly found in animals, from excitable membranes and swimming behavior [3] to programmed DNA elimination and its epigenetic control during somatic differentiation [4]. Paramecium tetraurelia became the most widely used species for genetics and physiology, because of its convenient growth properties

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call