Abstract

Here we describe the ways in which the sequence and annotation of the Plasmodium falciparum reference genome has changed since its publication in 2002. As the malaria species responsible for the most deaths worldwide, the richness of annotation and accuracy of the sequence are important resources for the P. falciparum research community as well as the basis for interpreting the genomes of subsequently sequenced species. At the time of publication in 2002 over 60% of predicted genes had unknown functions. As of March 2019, this number has been significantly decreased to 33%. The reduction is due to the inclusion of genes that were subsequently characterised experimentally and genes with significant similarity to others with known functions. In addition, the structural annotation of genes has been significantly refined; 27% of gene structures have been changed since 2002, comprising changes in exon-intron boundaries, addition or deletion of exons and the addition or deletion of genes. The sequence has also undergone significant improvements. In addition to the correction of a large number of single-base and insertion or deletion errors, a major miss-assembly between the subtelomeres of chromosome 7 and 8 has been corrected. As the number of sequenced isolates continues to grow rapidly, a single reference genome will not be an adequate basis for interpretating intra-species sequence diversity. We therefore describe in this publication a population reference genome of P. falciparum, called Pfref1. This reference will enable the community to map to regions that are not present in the current assembly. P. falciparum 3D7 will be continued to be maintained with ongoing curation ensuring continual improvements in annotation quality.

Highlights

  • The genome of Plasmodium falciparum 3D7 (a clone from the NF54 (Walliker et al, 1987) isolate), the species responsible for the most severe form of malaria, was the first reference genome published to support Plasmodium research

  • PfRef1 reference genome The P. falciparum 3D7 version 3.2 assembly was compared with PacBio assemblies that we have recently described for several other isolates (Otto et al, 2018) to create a population reference that we have termed PfRef1

  • In the original genome project that was published in 2002 the P. falciparum 3D7 apicoplast was not sequenced

Read more

Summary

Introduction

The genome of Plasmodium falciparum 3D7 (a clone from the NF54 (Walliker et al, 1987) isolate), the species responsible for the most severe form of malaria, was the first reference genome published to support Plasmodium research. The sequencing of P. falciparum was initially accompanied by the draft genome of a rodent malaria species, P. yoelii (Carlton et al, 2002). These genomes were followed by those of several other Plasmodium spp, sequenced using Sanger sequencing technology, including human-infective species P. vivax (Carlton et al, 2008), the monkey and human malaria parasite P. knowlesi (Pain et al, 2008) and further rodent Plasmodium spp (Hall et al, 2005). Many of these genomes are highly fragmented draft assemblies, algorithms that use high coverage of aligned short reads have enabled a variety of cost-effective genomeassembly improvements for several species (Swain et al, 2012)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call