Every year, an estimated 1.5 million people worldwide contract Hepatitis C, a significant contributor to liver problems. Although many studies have explored machine learning's potential to predict antiviral peptides, very few have addressed the problem of predicting peptides against specific viruses such as Hepatitis C. In this study, we demonstrate the application and fine-tuning of machine learning (ML) algorithms to predict peptides that are effective against Hepatitis C virus (HCV). We developed a fine-tuned and explainable ML model that harnesses the amino acid sequence of a peptide to predict its anti-hepatitis C potential. Specifically, features were computed based on sequence and physicochemical properties. The feature selection was performed using a combined strategy of mutual information and variance inflation factor. This facilitated the removal of redundant and multicollinear features, enhancing the model's generalizability in predicting anti-hepatitis C peptides (AHCPs). The model using the random forest algorithm produced the best performance with an accuracy of about 92%. The feature analysis highlights that the distributions of hydrophobicity, polarizability, coil-forming residues, frequency of glycine residues and the existence of dipeptide motifs VL, LV, and CC emerged as the key predictors for identifying AHCPs targeting different components of HCV. The developed model can be accessed through the Pred-AHCP web server, provided at http://tinyurl.com/web-Pred-AHCP. This resource facilitates the prediction and re-engineering of AHCPs for designing peptide-based therapeutics while also proposing an exploration of similar strategies for designing peptide inhibitors effective against other viruses. The developed ML model can also be used for validating peptide sequences generated using generative artificial intelligence methods for further optimization.
Read full abstract