Abstract

BackgroundHIV-1 targets human cells expressing both the CD4 receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) co-receptors, which interact primarily with the third hypervariable loop (V3 loop) of gp120. Determination of HIV-1 affinity for either the R5 or X4 co-receptor on host cells facilitates the inclusion of co-receptor antagonists as a part of patient treatment strategies. A dataset of 1193 distinct gp120 V3 loop peptide sequences (989 R5-utilizing, 204 X4-capable) is utilized to train predictive classifiers based on implementations of random forest, support vector machine, boosted decision tree, and neural network machine learning algorithms. An in silico mutagenesis procedure employing multibody statistical potentials, computational geometry, and threading of variant V3 sequences onto an experimental structure, is used to generate a feature vector representation for each variant whose components measure environmental perturbations at corresponding structural positions.ResultsClassifier performance is evaluated based on stratified 10-fold cross-validation, stratified dataset splits (2/3 training, 1/3 validation), and leave-one-out cross-validation. Best reported values of sensitivity (85%), specificity (100%), and precision (98%) for predicting X4-capable HIV-1 virus, overall accuracy (97%), Matthew's correlation coefficient (89%), balanced error rate (0.08), and ROC area (0.97) all reach critical thresholds, suggesting that the models outperform six other state-of-the-art methods and come closer to competing with phenotype assays.ConclusionsThe trained classifiers provide instantaneous and reliable predictions regarding HIV-1 co-receptor usage, requiring only translated V3 loop genotypes as input. Furthermore, the novelty of these computational mutagenesis based predictor attributes distinguishes the models as orthogonal and complementary to previous methods that utilize sequence, structure, and/or evolutionary information. The classifiers are available online at http://proteins.gmu.edu/automute.

Highlights

  • HIV-1 targets human cells expressing both the CD4 receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) co-receptors, which interact primarily with the third hypervariable loop (V3 loop) of gp120

  • Host cells targeted for entry by HIV-1 express the cellular CD4 receptor as well as a secondary cellular chemokine co-receptor, principally either CCR5 (R5) or CXCR4 (X4), all of which interact with the HIV-1 envelope glycoprotein gp120

  • Variant V3 loop feature vectors generated by the combined sequence-structure in silico mutagenesis methodology described in this manuscript have been shown to encode signals that robustly discriminate between the R5 and DM classes, yielding universally reliable predictive models based on a variety of supervised classification machine learning algorithms

Read more

Summary

Introduction

HIV-1 targets human cells expressing both the CD4 receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) co-receptors, which interact primarily with the third hypervariable loop (V3 loop) of gp120. Co-receptors R5 and X4 interact to a great extent with the third hypervariable loop (V3 loop) of the HIV-1 envelope glycoprotein gp120 [6], a peptide fragment distant from the gp120 core and comprised of 35 amino acids with a disulfide bridge formed by cysteine residues at the N- and C-termini (Fig. 1) This interaction suggests that accumulation of amino acid replacements at multiple positions within the V3 loop is responsible for the eventual switch in co-receptor affinity; there are competing arguments as to whether V3 loop structural changes drive co-receptor selectivity, or if one predominant conformation exists for both R5 and X4 variants and that sequence changes alone account for the switch in co-receptor usage [7,8]. Evidence suggesting a dual contribution was provided by a study in which knowledge-based potentials were used to assess the fitness of variant V3 loop sequences on candidate structures generated by Markov Chain Monte Carlo techniques applied to NMR data [9]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call