IDEPI: rapid prediction of HIV-1 antibody epitopes and other phenotypic features from sequence data using a flexible machine learning platform.

N Lance Hepler,Sergei L Kosakovsky Pond,Ben Murrell,Steven Weaver,Konrad Scheffler,Dennis R Burton,Pascal Poignard,Douglas D Richman,Davey M Smith

doi:10.1371/journal.pcbi.1003842

Abstract

Since its identification in 1983, HIV-1 has been the focus of a research effort unprecedented in scope and difficulty, whose ultimate goals — a cure and a vaccine – remain elusive. One of the fundamental challenges in accomplishing these goals is the tremendous genetic variability of the virus, with some genes differing at as many as 40% of nucleotide positions among circulating strains. Because of this, the genetic bases of many viral phenotypes, most notably the susceptibility to neutralization by a particular antibody, are difficult to identify computationally. Drawing upon open-source general-purpose machine learning algorithms and libraries, we have developed a software package IDEPI (IDentify EPItopes) for learning genotype-to-phenotype predictive models from sequences with known phenotypes. IDEPI can apply learned models to classify sequences of unknown phenotypes, and also identify specific sequence features which contribute to a particular phenotype. We demonstrate that IDEPI achieves performance similar to or better than that of previously published approaches on four well-studied problems: finding the epitopes of broadly neutralizing antibodies (bNab), determining coreceptor tropism of the virus, identifying compartment-specific genetic signatures of the virus, and deducing drug-resistance associated mutations. The cross-platform Python source code (released under the GPL 3.0 license), documentation, issue tracking, and a pre-configured virtual machine for IDEPI can be found at https://github.com/veg/idepi.

Highlights

The challenge of predicting a viral phenotype from sequence data has many motivating examples in HIV-1 research
IDEPI is customizable: different machine learning algorithms implemented in scikit-learn can be used; new sequence features can be defined using a well-specified application programming interface (API); various feature selection approaches can be used; performance can be optimized with respect to many metrics
Simulated data In order to establish baseline performance of IDEPI where the true "phenotype" is known, we simulated the evolution of N~241 HIV-1 protein envelope sequences subject to directional selective pressure applied to sites in an epitope along a subset of terminal tree branches selected at random

Summary

Introduction

The challenge of predicting a viral phenotype from sequence data has many motivating examples in HIV-1 research. [2]) are well established and used both in research [3] and in clinical practice [4] These algorithms have been developed based on large training sets using phenotypic assays, for example those measuring half maximal inhibitory concentration (IC50) of an antiretroviral drug (ARV) [5] to label sequences resistant or susceptible. As a byproduct of bNab characterization, large panels of phenotypic (IC50) and matched envelope sequences have been generated, and several recent efforts [44,45,46,47,48] have been directed at applying machine learning techniques to these data in order to predict the resistance phenotypes of HIV-1 strains and to infer antibody epitopes. IDEPI is customizable: different machine learning algorithms implemented in scikit-learn can be used; new sequence features can be defined using a well-specified application programming interface (API); various feature selection approaches (e.g. forward or backward selection) can be used; performance can be optimized with respect to many metrics (e.g. sensitivity)

Design and Implementation

Results

Simulation Simple Intermediate Complex Random

IDEPI performance

Resistant Resistant Susceptible

Parts of the canonical

Stanford HIVdb

Availability and Future Directions

Supporting Information

Author Contributions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS Computational Biology	Publication Date: Sep 25, 2014
Citations: 82	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

IDEPI: rapid prediction of HIV-1 antibody epitopes and other phenotypic features from sequence data using a flexible machine learning platform.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

Understanding Software-2.0
Malinda Dilhara ... Danny Dig
ACM Transactions on Software Engineering and Methodology | VOL. 30
Malinda Dilhara, et. al.Malinda Dilhara ... Danny Dig
23 Jul 2021
ACM Transactions on Software Engineering and Methodology | VOL. 30

An Empirical Study on Real Bugs for Machine Learning Programs
Xiaobing Sun ... Bin Li
-
Xiaobing Sun, et. al.Xiaobing Sun ... Bin Li
01 Dec 2017
01 Dec 2017

MLCatchUp: Automated Update of Deprecated Machine-Learning APIs in Python
Stefanus A Haryono ... Lingxiao Jiang
-
Stefanus A Haryono, et. al.Stefanus A Haryono ... Lingxiao Jiang
01 Sep 2021
01 Sep 2021

Characterization and Automatic Updates of Deprecated Machine-Learning API Usages
Stefanus A Haryono ... Ferdian Thung
-
Stefanus A Haryono, et. al.Stefanus A Haryono ... Ferdian Thung
01 Sep 2021
01 Sep 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

IDEPI: rapid prediction of HIV-1 antibody epitopes and other phenotypic features from sequence data using a flexible machine learning platform.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology