Abstract

BackgroundRecent studies have confirmed that N7-methylguanosine (m7G) modification plays an important role in regulating various biological processes and has associations with multiple diseases. Wet-lab experiments are cost and time ineffective for the identification of disease-associated m7G sites. To date, tens of thousands of m7G sites have been identified by high-throughput sequencing approaches and the information is publicly available in bioinformatics databases, which can be leveraged to predict potential disease-associated m7G sites using a computational perspective. Thus, computational methods for m7G-disease association prediction are urgently needed, but none are currently available at present.ResultsTo fill this gap, we collected association information between m7G sites and diseases, genomic information of m7G sites, and phenotypic information of diseases from different databases to build an m7G-disease association dataset. To infer potential disease-associated m7G sites, we then proposed a heterogeneous network-based model, m7G Sites and Diseases Associations Inference (m7GDisAI) model. m7GDisAI predicts the potential disease-associated m7G sites by applying a matrix decomposition method on heterogeneous networks which integrate comprehensive similarity information of m7G sites and diseases. To evaluate the prediction performance, 10 runs of tenfold cross validation were first conducted, and m7GDisAI got the highest AUC of 0.740(± 0.0024). Then global and local leave-one-out cross validation (LOOCV) experiments were implemented to evaluate the model’s accuracy in global and local situations respectively. AUC of 0.769 was achieved in global LOOCV, while 0.635 in local LOOCV. A case study was finally conducted to identify the most promising ovarian cancer-related m7G sites for further functional analysis. Gene Ontology (GO) enrichment analysis was performed to explore the complex associations between host gene of m7G sites and GO terms. The results showed that m7GDisAI identified disease-associated m7G sites and their host genes are consistently related to the pathogenesis of ovarian cancer, which may provide some clues for pathogenesis of diseases.ConclusionThe m7GDisAI web server can be accessed at http://180.208.58.66/m7GDisAI/, which provides a user-friendly interface to query disease associated m7G. The list of top 20 m7G sites predicted to be associted with 177 diseases can be achieved. Furthermore, detailed information about specific m7G sites and diseases are also shown.

Highlights

  • Over 150 types of RNA modifications have been identified in RNA molecules [1, 2], and N7-methylguanosine ­(m7G), which refers to methylation of guanosine(G) on position N7 is a typical positively charged modification present in tRNA [3], rRNA [4], mRNA 5′cap [5] and internal mRNA regions [6], playing a critical role in regulating RNA processing, metabolism,and function

  • As a positively charged RNA modification, ­m7G could tune RNA secondary structures or protein-RNA interactions through a combination of electrostatic and steric effects [7]. ­m7G sites in several tRNAs variable loops, which are installed by the heterodimers METTL1-WDR4 in mammals [3], have been reported to stabilize tRNA tertiary fold [8, 9]. ­m7G sites that install at 5′cap stabilize transcripts against exonucleolytic degradation [10], and modulate nearly every stage of the mRNA life cycle, including transcription elongation [11], pre-mRNA splicing [12], polyadenylation [13], nuclear export [14], and translation [15]

  • Experimental design To systematically evaluate the prediction performance of ­m7GDisAI on the ­m7G-disease association dataset, tenfold cross validation and leave-one-out cross validation (LOOCV) strategies were adopted for the experiments

Read more

Summary

Results

Experimental design To systematically evaluate the prediction performance of ­m7GDisAI on the ­m7G-disease association dataset, tenfold cross validation and LOOCV strategies were adopted for the experiments. After performing ­m7GDisAI on training set, the test associations were ranked together with the candidate associations in descending order according to the predicted value obtained. Regardless of tenfold cross validation, global LOOCV and local LOOCV, for a given threshold τ, a test association is regarded as true positive (TP) if it ranks above the threshold, false negative (FN) otherwise. Since CNFHN has the best performance in the tenfold cross validation experiments, we performed it on the training samples to score the candidate samples, especially those under ovarian cancer. Ginath et al reported [53] that ERBB2 (host gene of m7G_ID_268139) activates multiple downstream signaling pathways, and promotes the proliferation, invasion, and metastasis of tumor cells

Conclusion
Introduction
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call