Abstract
Human immunodeficiency virus (HIV) is the retroviral agent that causes acquired immune deficiency syndrome (AIDS). The number of HIV caused deaths was about 4 million in 2016 alone; it was estimated that about 33 million to 46 million people worldwide living with HIV. The HIV disease is especially harmful because the progressive destruction of the immune system prevents the ability of forming specific antibodies and to maintain an efficacious killer T cell activity. Successful prediction of HIV protein has important significance for the biological and pharmacological functions. In this study, based on the concept of Chou’s pseudo amino acid (PseAA) composition and increment of diversity (ID), support vector machine (SVM), logisitic regression (LR), and multilayer perceptron (MP) were presented to predict HIV-1 proteins and HIV-2 proteins. The results of the jackknife test indicated that the highest prediction accuracy and CC values were obtained by the SVM and MP were 0.9909 and 0.9763, respectively, indicating that the classifiers presented in this study were suitable for predicting two groups of HIV proteins.
Highlights
Human immunodeficiency virus (HIV) is a retrovirus of the lentivirus family; it is thought to have originated in non-human primates in sub-Saharan Africa and transferred to humans in the 20th century[1,2,3,4]
In this study, the HIV-1 proteins and HIV-2 proteins were downloaded from the Swiss-Prot database[5], and the amino acid (AA) compositions and pseudo amino acid (PseAA) compositions of HIV proteins were used as the input parameters of increment of diversity (ID) algorithm
To further study the difference in amino acid usage, we compared the percentages of each amino acid, respectively, between the HIV-1 proteins and HIV-2 proteins (Table 1)
Summary
Human immunodeficiency virus (HIV) is a retrovirus of the lentivirus family; it is thought to have originated in non-human primates in sub-Saharan Africa and transferred to humans in the 20th century[1,2,3,4]. Several machine learning methods have been developed for predicting different groups of proteins by using sequence derived features, and good prediction results are obtained. The present work reported on the machine learning methods for prediction of the HIV-1 proteins and HIV-2 proteins, using the concept of Chou’s pseudo amino acid (PseAA) composition and increment of diversity (ID). The increment of diversity (ID) is a measure of the whole uncertainly and total information of a system[26] This algorithm has been used in the recognition of protein structural class[32], the exon-intron splice site prediction[26], and conotoxins superfamily prediction[33] in recent years. We are to describe how to deal with these steps one-by-one
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.