Abstract
BackgroundGenome sequencing of Anopheles gambiae was completed more than ten years ago and has accelerated research on malaria transmission. However, annotation needs to be refined and verified experimentally, as most predicted transcripts have been identified by comparative analysis with genomes from other species. The mosquito midgut—the first organ to interact with Plasmodium parasites—mounts effective antiplasmodial responses that limit parasite survival and disease transmission. High-throughput Illumina sequencing of the midgut transcriptome was used to identify new genes and transcripts, contributing to the refinement of An. gambiae genome annotation.ResultsWe sequenced ~223 million reads from An. gambiae midgut cDNA libraries generated from susceptible (G3) and refractory (L35) mosquito strains. Mosquitoes were infected with either Plasmodium berghei or Plasmodium falciparum, and midguts were collected after the first or second Plasmodium infection. In total, 22,889 unique midgut transcript models were generated from both An. gambiae strain sequences combined, and 76% are potentially novel. Of these novel transcripts, 49.5% aligned with annotated genes and appear to be isoforms or pre-mRNAs of reference transcripts, while 50.5% mapped to regions between annotated genes and represent novel intergenic transcripts (NITs). Predicted models were validated for midgut expression using qRT-PCR and microarray analysis, and novel isoforms were confirmed by sequencing predicted intron-exon boundaries. Coding potential analysis revealed that 43% of total midgut transcripts appear to be long non-coding RNA (lncRNA), and functional annotation of NITs showed that 68% had no homology to current databases from other species. Reads were also analyzed using de novo assembly and predicted transcripts compared with genome mapping-based models. Finally, variant analysis of G3 and L35 midgut transcripts detected 160,742 variants with respect to the An. gambiae PEST genome, and 74% were new variants. Intergenic transcripts had a higher frequency of variation compared with non-intergenic transcripts.ConclusionThis in-depth Illumina sequencing and assembly of the An. gambiae midgut transcriptome doubled the number of known transcripts and tripled the number of variants known in this mosquito species. It also revealed existence of a large number of lncRNA and opens new possibilities for investigating the biological function of many newly discovered transcripts.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-636) contains supplementary material, which is available to authorized users.
Highlights
Genome sequencing of Anopheles gambiae was completed more than ten years ago and has accelerated research on malaria transmission
In Drosophila, whole-genome tiling-array expression analysis revealed that the initial genome sequence annotation had missed 30% of the transcripts [5], and in the P. falciparum malaria parasite, the first genome sequence contained errors in 25% of the predicted gene models [6]
We report the in-depth transcriptome analysis of the An. gambiae mosquito midgut using RNA-seq by Illumina sequencing with the goal of discovering new transcripts and improving the genome annotation, especially of midgut-expressed genes, as interaction of Plasmodium with this organ is critical for the parasite to establish an infection
Summary
Genome sequencing of Anopheles gambiae was completed more than ten years ago and has accelerated research on malaria transmission. Malaria control has relied mainly on vector control—with insecticides and insecticide-impregnated nets—and on antimalarial therapy of infected humans. These strategies epithelial barrier that parasites must traverse to complete their development, and cellular responses of invaded midgut cells have been shown to limit parasite survival [3]. The genome, published in 2002, was done using shotgun sequencing; gene prediction and annotation was done, in large part, in silico based on homology with known genes from other species [4] This is a powerful approach, but it has some limitations, as there can be errors in the predicted gene models and many transcripts—for example, those unique to An. gambiae—could be missed. In Drosophila, whole-genome tiling-array expression analysis revealed that the initial genome sequence annotation had missed 30% of the transcripts [5], and in the P. falciparum malaria parasite, the first genome sequence contained errors in 25% of the predicted gene models [6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.