Abstract

BackgroundAntiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance.ResultsWe analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs.ConclusionsResults show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket.org/elies_ramon/catkern.

Highlights

  • Antiretroviral drugs are a very effective therapy against HIV infection

  • The data is split in four databases (PI, nucleoside reverse transcriptase inhibitors (NRTI), nucleoside reverse transcriptase inhibitors (NNRTI) and INI), which contain between 1,000–3,500 HIV isolates

  • We did a comparison with the Artificial Neural Networks (ANN), which to our knowledge achieved the best performance so far in this dataset [14]

Read more

Summary

Introduction

Antiretroviral drugs are a very effective therapy against HIV infection. the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Some of the main reasons why HIV is so difficult to fight are its short life cycle (1–2 days), high replication rate (108–109 new virions each day), and high mutation rate (10− 4–10− 5 mutations per nucleotide site per replication cycle) caused because reverse transcriptase lacks proofreading activity.

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.