Abstract

Acquired immunodeficiency syndrome (AIDS) is a fatal disease which highly threatens the health of human being. Human immunodeficiency virus (HIV) is the pathogeny for this disease. Investigating HIV-1 protease cleavage sites can help researchers find or develop protease inhibitors which can restrain the replication of HIV-1, thus resisting AIDS. Feature selection is a new approach for solving the HIV-1 protease cleavage site prediction task and it’s a key point in our research. Comparing with the previous work, there are several advantages in our work. First, a filter method is used to eliminate the redundant features. Second, besides traditional orthogonal encoding (OE), two kinds of newly proposed features extracted by conducting principal component analysis (PCA) and non-linear Fisher transformation (NLF) on AAindex database are used. The two new features are proven to perform better than OE. Third, the data set used here is largely expanded to 1922 samples. Also to improve prediction performance, we conduct parameter optimization for SVM, thus the classifier can obtain better prediction capability. We also fuse the three kinds of features to make sure comprehensive feature representation and improve prediction performance. To effectively evaluate the prediction performance of our method, five parameters, which are much more than previous work, are used to conduct complete comparison. The experimental results of our method show that our method gain better performance than the state of art method. This means that the feature selection combined with feature fusion and classifier parameter optimization can effectively improve HIV-1 cleavage site prediction. Moreover, our work can provide useful help for HIV-1 protease inhibitor developing in the future.

Highlights

  • Acquired immune deficiency syndrome (AIDS) is quite a mortality disease, which is due to the patients’ infection of Human immunodeficiency virus (HIV)-1

  • HIV-1 protease is a key enzyme in the virus replication process, and it cleaves specific kinds of small proteins to smaller peptides which will generate the indispensable proteins for the replication process [1]

  • Comparing all the results shown in the three tables, we can find the best results are feature fusion of the three subsets using the SVM parameters optimized based on classification accuracy

Read more

Summary

Introduction

Acquired immune deficiency syndrome (AIDS) is quite a mortality disease, which is due to the patients’ infection of HIV-1. HIV-1 protease is a key enzyme in the virus replication process, and it cleaves specific kinds of small proteins to smaller peptides which will generate the indispensable proteins for the replication process [1]. HIV-1 protease inhibitors can combine with the protease firmly but cannot be cleaved, so the protease will not combine with the substrates and its function will be inhibited. It’s not practical to find inhibitors in laboratory by conducting biological experiment, because there are too many kinds of peptides to test one by one. Take octapeptide for example: there are 20 kinds of amino acid residues in nature, there are 208 kinds of octapeptides altogether. It’s impossible to test so many octapeptides by biological experiment. Machine learning can be used here to solve the problem [2]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call