Abstract

Background: Although thousands of clinical isolates of Plasmodium falciparum are being sequenced and analysed by short read technology, the data do not resolve the highly variable subtelomeric regions of the genomes that contain polymorphic gene families involved in immune evasion and pathogenesis. There is also no current standard definition of the boundaries of these variable subtelomeric regions. Methods: Using long-read sequence data (Pacific Biosciences SMRT technology), we assembled and annotated the genomes of 15 P. falciparum isolates, ten of which are newly cultured clinical isolates. We performed comparative analysis of the entire genome with particular emphasis on the subtelomeric regions and the internal var genes clusters. Results: The nearly complete sequence of these 15 isolates has enabled us to define a highly conserved core genome, to delineate the boundaries of the subtelomeric regions, and to compare these across isolates. We found highly structured variable regions in the genome. Some exported gene families purportedly involved in release of merozoites show copy number variation. As an example of ongoing genome evolution, we found a novel CLAG gene in six isolates. We also found a novel gene that was relatively enriched in the South East Asian isolates compared to those from Africa. Conclusions: These 15 manually curated new reference genome sequences with their nearly complete subtelomeric regions and fully assembled genes are an important new resource for the malaria research community. We report the overall conserved structure and pattern of important gene families and the more clearly defined subtelomeric regions.

Highlights

  • Plasmodium falciparum is the most virulent of malaria parasites that infect humans and is responsible for hundreds of thousands of deaths each year[1]

  • ~25,000 indel discrepancies and ~1800 single nucleotide substitutions (Table 1) were initially found, correcting the contigs by iteratively mapping Illumina reads reduced the number of indels to ~5,800 and the single-nucleotide substitutions to ~1,000

  • We observed large numbers of errors in the analysis. Homopolymers and (TA) repeat sequences. 96% of these errors were in repeats > 14bp, so they were excluded from the analysis

Read more

Summary

Introduction

Plasmodium falciparum is the most virulent of malaria parasites that infect humans and is responsible for hundreds of thousands of deaths each year[1]. In 2002, the first reference genome for P. falciparum was published using a lab adapted clone (3D7) that originated from an infection in Africa[2]. The limitations of using this single clone as a reference to map genome variation from all clinical isolates are largely unknown. Because of the highly polymorphic nature of the subtelomeric and internal var gene sequences, most short reads from these areas cannot be aligned to the reference genome. Thousands of clinical isolates of Plasmodium falciparum are being sequenced and analysed by short read technology, the data do not resolve the highly variable subtelomeric regions of the genomes that contain polymorphic gene families involved in immune evasion and pathogenesis. We performed comparative analysis of the entire genome with particular emphasis on the subtelomeric regions and the internal var genes clusters

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call