Abstract

Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5′ and 3′ untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.

Highlights

  • Plasmodium falciparum malaria is responsible for more than one million deaths annually, most of which occur in young children (Breman, 2001)

  • In previous DNA microarraybased genome-wide transcriptome analyses, this effect has largely been ignored with cDNA being synthesized directly from total RNA using oligo(dT) followed by in vitro reverse transcription (Chen et al, 2003) or a combination of oligo(dT) and random priming (Bozdech et al, 2003)

  • Sample mapped against ribosomal RNAs (rRNAs) loci targeted by our depletion strategies and covered only 1% of the genome more than 10 times

Read more

Summary

Introduction

Plasmodium falciparum malaria is responsible for more than one million deaths annually, most of which occur in young children (Breman, 2001). The development of new antimalarial compounds has been slow, mostly due to a lack of well-defined Plasmodiumspecific targets, adding to a growing concern as established drugs become ineffective due to widespread resistance in the field (Arav-Boger and Shapiro, 2005). In 2002, the genome of the 3D7 clone of P. falciparum was sequenced (Gardner et al, 2002), renewing hope that progress towards reducing the burden of malaria would be greatly accelerated. The P. falciparum genome encodes roughly 5400 genes and has the lowest G+C content (19%) of any genome sequenced to date. Half of the predicted coding sequences (CDSs) are uncharacterized, with little sequence similarity outside the Plasmodium genus, and a large number of genes and gene families are unique to P. falciparum. The proteome contains a high proportion of low complexity sequence where poly-asparagine regions are highly prevalent (Aravind et al, 2003)

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call