Identifying circRNA-microRNA interactions (CMI) is a significant biomedical issue in recent years. This problem provides insights into using circRNA as biomarkers, developing cancer therapies and producing cancer vaccines. Using computational methods for identification is a more time-efficient and cost-effective approach. In computational methods, using graphs to represent and explore the CMI is a mainstream approach. However, existing relevant methods do not achieve optimal results by utilizing both the semantic information extracted from sequences and the topological information extracted from graph structures. To address this issue, we propose HGLMALLM, a graph contrastive learning method that learns node representation crossing both the semantic domain generated via motif-aware pre-trained LLMs and the topological domain extracted from hierarchical graph structures. Our method effectively addresses the issue in existing Message Passing Neural Network (MPNN) method that edge components losing heterogeneity after multiple iterations. Moreover, this method utilizes the heterogeneity of graph which is extended from the traditional bipartite graph to heterogeneous through the semantic domain. Two commonly used datasets were partitioned based on the distribution of node degrees. Then, we benchmarked our method against existing methods. In the independent testing set evaluation, it achieved a 3 % and 1 % improvement on two datasets. Our method demonstrated the best stability in ten-fold cross-validation on the training set. A test conducted on the peripheral components reveals robust performance of our model. A dataset collected from real scenarios was used to demonstrate the strong predictive ability of our method for identifying unidentified CMI.
Read full abstract