Anti-coronavirus peptides (ACVPs) have garnered significant attention in COVID-19 therapeutic research due to their precise targeting, low risk of drug resistance, flexible synthesis, and effectiveness against viral mutations. Although some in-silico methods have been developed to predict ACVPs, they suffer from challenges such as limited datasets and a lack of interpretability. Hence, this study introduces ACVPred, an algorithm for ACVP prediction, based on two few-shot learning strategies: transfer learning and data augmentation strategies. Our experiments demonstrate that data augmentation can significantly enhance model performance, while transfer learning can effectively prevent overfitting and strengthen generalizability. Compared to existing methods, ACVPred exhibits superior performance and robust generalization both in training and independent test datasets. Moreover, the interpretability study of the model reveals that its transformer-based core can effectively capture key motifs on ACVP sequences, demonstrating strong feature learning capabilities. Additionally, the findings suggest that the sequence feature weights and key motif positions tend to be distributed towards the N-terminal end of ACVP sequences, providing vital clues for the design of ACVPs. In summary, ACVPred is not only a practical and valuable tool for aiding in the design of ACVPs, but its algorithmic concept also serves as an important reference for research on other small sample prediction problems.
Read full abstract