Abstract

Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning scheme to build classifiers by considering experimentally verified cleavable sites as positive samples and unknown sites as negative samples. However, certain noisy can be contained in the negative set, as false negative samples are possibly existed. Hence, the performance of the classifiers is not as accurate as they could be due to the biased prediction results. In this work, unknown substrate sites are regarded as unlabeled samples instead of negative ones. We propose a novel positive-unlabeled learning algorithm, namely PU-HIV, for an effective prediction of HIV-1 protease cleavage sites. Features used by PU-HIV are encoded from different perspectives of substrate sequences, including amino acid identities, coevolutionary patterns and chemical properties. By adjusting the weights of errors generated by positive and unlabeled samples, a biased support vector machine classifier can be built to complete the prediction task. In comparison with state-of-the-art prediction models, benchmarking experiments using cross-validation and independent tests demonstrated the superior performance of PU-HIV in terms of AUC, PR-AUC, and F-measure. Thus, with PU-HIV, it is possible to identify previously unknown, but physiologically existed substrate sites that are able to be cleaved by HIV-1 protease, thus providing valuable insights into designing novel HIV-1 protease inhibitors for HIV treatment.

Highlights

  • As the causative agent of acquired immunodeficiency syndrome (AIDS), human immunodeficiency virus type 1 (HIV-1) is able to destroy the immune system of human body by spreading in a cell-free system or from cell to cell (Abela et al, 2012)

  • Rögnvaldsson et al (2015) further adopted a linear SVM (LSVM) model combined with orthogonal coding, and claimed that the proposed model achieved a better performance in predicting the cleavage sites of HIV-1 protease when compared with state-of-the-art prediction models

  • In order to evaluate the performance of PU-HIV, we conducted a series of extensive experiments and compared PU-HIV with several state-of-the-art prediction models including EvoCleave (Hu et al, 2020a), Rögnvaldsson et al (2015), HIVcleave (Shen and Chou, 2008), PROSPERous (Song et al, 2018), iProt-Sub (Song et al, 2019), and DeepCleave (Li et al, 2020)

Read more

Summary

Introduction

As the causative agent of acquired immunodeficiency syndrome (AIDS), human immunodeficiency virus type 1 (HIV-1) is able to destroy the immune system of human body by spreading in a cell-free system or from cell to cell (Abela et al, 2012). A series of laboratory-based experiments have been conducted in order to better understand the mechanisms of HIV-1 replicative cycle. Their results indicate that HIV-1 protease (PR) plays an essential role in producing mature and infectious virions (Sadiq et al, 2012). HIV-1 PR guarantees the maturation of HIV virions by cleaving the viral precursor Gag and Gag-Pol polyproteins into infectious virus particles with aberrant structure (Weber et al, 1989). For the purpose of HIV treatment, an efficient way is to prevent the HIV-1 replication by inhibiting the activity of corresponding PR.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call