Abstract
The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.
Highlights
The human malaria parasite Plasmodium falciparum causes approximately 1 million deaths each year, primarily in young children in sub-Saharan Africa [1]
While there are many different classes of these domains, indexed by a, b, etc., the Nterminal region of the protein almost always begins with a Duffy Binding Like-a (DBLa) and CIDRa pair, each of which has been implicated in the binding of infected red blood cells to various host receptors as well as different disease pathologies
We examined highly variable regions (HVRs) networks for evidence of strong associations between network community structure and subclassifications, but found that only DBLa1.3 sequences had a strong tendency to link to other sequences from the same subclassification, in HVRs 7, 8, and 9, where they formed cliques on the periphery of the networks
Summary
Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America, 2 Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, Massachusetts, United States of America, 3 Department of Computer Science, University of Colorado, Boulder, Colorado, United States of America, 4 BioFrontiers Institute, University of Colorado, Boulder, Colorado, United States of America, 5 Santa Fe Institute, Santa Fe, New Mexico, United States of America
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.