Abstract

BackgroundFoodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas. Approximately 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease. Despite their medical and economic importance, no genome sequence for any Paragonimus species is available.ResultsWe sequenced and assembled the genome of P. westermani, which is among the largest of the known pathogen genomes with an estimated size of 1.1 Gb. A 922.8 Mb genome assembly was generated from Illumina and Pacific Biosciences (PacBio) sequence data, covering 84% of the estimated genome size. The genome has a high proportion (45%) of repeat-derived DNA, particularly of the long interspersed element and long terminal repeat subtypes, and the expansion of these elements may explain some of the large size. We predicted 12,852 protein coding genes, showing a high level of conservation with related trematode species. The majority of proteins (80%) had homologs in the human liver fluke Opisthorchis viverrini, with an average sequence identity of 64.1%. Assembly of the P. westermani mitochondrial genome from long PacBio reads resulted in a single high-quality circularized 20.6 kb contig. The contig harbored a 6.9 kb region of non-coding repetitive DNA comprised of three distinct repeat units. Our results suggest that the region is highly polymorphic in P. westermani, possibly even within single worm isolates.ConclusionsThe generated assembly represents the first Paragonimus genome sequence and will facilitate future molecular studies of this important, but neglected, parasite group.

Highlights

  • Foodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas

  • 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease

  • In India the incidence rates of paragonimiasis caused by P. westermani is currently unknown [2,3,4]; many cases of paragonimiasis are attributed to the related worm Paragonimus heterotremus [2]

Read more

Summary

Background

Paragonimus lung flukes represent a significant and widespread clinical problem, with an estimated 23 million people infected worldwide [1]. The P. westermani genome size was estimated to be 1.1 Gb. PacBio sequence data were error corrected by proovread version 2.13.13 [12], using Illumina short reads from the 200 bp and 450 bp libraries as input, and assembled into contigs by Mira v4.0.2 (MIRA, RRID:SCR 010731) [13]. Published RNA-sequencing (RNA-seq) data from adult P. westermani [9] were obtained from the short-read archive and mapped to our genome assembly using the Star aligner [33], version 2.5, with the option –twopassMode Basic. Predicted P. westermani coding genes were mapped to the genomes of related trematode species using Exonerate, version 2.4.0, requiring a minimal sequence identity of 30% and excluding matches spanning less than 40% of the query protein. It has further been estimated that the divergence of S. mansoni did likely not occur before 2–

Discussion
Findings
Availability of supporting data
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call