Abstract MicroRNAs play a crucial role in post-transcriptional regulation, influencing over 60% of human protein-coding genes by targeting specific mRNA sites to suppress protein translation. Various predictive algorithms aim to discern potential microRNA-mRNA pairs. Current approaches primarily employ sequence alignment, machine learning, and deep learning, yet encounter challenges such as complex data pre-processing, time-consuming model generation, and limited binding site precision. Additionally, some methods rely on inefficient RNN-based models, resulting in sluggish predictions. To address these issues, we proposed a CNN-based algorithm combined with transfer learning for direct and precise prediction of microRNA-mRNA binding sites, eliminating the need for extensive preprocessing. We introduced two models in this study: the per-based model and the miRNA-target binding decision model. The former screens potential target sites on 3’-UTR sequences, while the latter guides the decision-making process for miRNA-target pairs. The per-based model utilized a public database, extracting 786,447 human microRNA-mRNA pairs verified by CLIP-seq. It employed sequence alignment approaches to determine putative binding sites and per-base binding states. MicroRNA sequences and seed regions served as the initial convolutional kernels in the deep learning model, combined with encoded full-length 3’-UTR sequences of mRNAs as inputs for the fine-tuned U-Net architecture. For the miRNA-target binding decision model, we excluded the per-based model's decoder and initialized a new classification layer for target-site prediction. Leveraging experimental validation datasets from public databases, we extracted 2,846 binding and 1,058 non-binding human microRNA-mRNA pairs. The pre-trained model was fine-tuned on these 3,904 pairs. Both models underwent training with cycle learning rate, focal loss, gradient clipping, and weighted decay to address dataset imbalances. The dataset was split into 80% training and 20% testing data, with balanced accuracy as the evaluation metric. The per-based model achieved a robust 82.14% balanced accuracy on the test data, excelling in handling imbalanced datasets in per-based tasks. It swiftly and accurately identified nucleotide binding states. The miRNA-target binding decision model outperformed existing methods with a balanced accuracy of 80.39% on the test data. In contrast to many deep learning methods requiring additional preprocessing, our algorithm directly predicts per-base binding states from full-length sequences. It seamlessly transfers knowledge from the per-based to the miRNA-target model. Importantly, our approach relies solely on seed regions, eliminating the need for prior knowledge and enhancing microRNA target prediction reliability. This advancement holds promise for biomedical and clinical researchers, offering valuable insights. Citation Format: Chen-Hao Peng, Hui-Yu Chen, Da-Chuan Cheng, Eric Y. Chuang, Chien-Yueh Lee. A CNN-based approach with efficient transfer learning improves microRNA-mRNA prediction [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3513.
Read full abstract