Protein-Protein Interactions (PPIs) encompass the physical interactions or chemical combinations between proteins, which cells employ to carry out diverse biological functions. PPIs Prediction is of great significance for understanding and studying molecular mechanisms and disease mechanisms in organisms. Traditional machine learning algorithms for predicting PPIs face challenges related to imbalanced sample data, loss of long sequence features, and variations in datasets across different species. This paper takes advantage of the attention mechanism in processing sequence tasks and proposes the fusion of Bi-directional Long Short- Term Memory, an integrated learning model of convolutional neural network and multi-head attention mechanism, referred to as CNN BiLSTM Multi-head Att. The model first performs one-hot encoding on the protein sequence and obtains a low-dimensional word vector representation, and then calculates the encoding matrix through the embedding matrix; it uses the convolutional neural network (CNN) and the bidirectional long short-term memory network (Bi-LSTM) to extract the amino acids in the protein sequence. The feature matrix is obtained by collecting information on the time, site, and physical and chemical characteristics; the multi-head attention mechanism is used in weighted calculation and merger of the subspace features of each sequence, and classification is performed after global average pooling and the PPI prediction is output. Cross-validation experiments on different proportions of positive and negative protein data sets and proteins of different species show that CNN BiLSTM Multi-head Att can better predict imbalanced data sets and protein interactions of different species. Compared to the state-of-the-art model, our model improves accuracy by 0.025, precision by 0.046, F1-score by 0.052, and recall by 0.014. Exploring the scalability and adaptability of the proposed model for large-scale biological datasets, along with investigating its potential in real-time applications, could lead to significant advancements in the field of protein–protein interaction prediction.
Read full abstract