Abstract
BackgroundmicroRNA (miRNA) is a short RNA (~ 22 nt) that regulates gene expression at the posttranscriptional level. Aberration of miRNA expressions could affect their targeting mRNAs involved in cancer-related signaling pathways. We conduct clustering analysis of miRNA and mRNA using expression data from the Cancer Genome Atlas (TCGA). We combine the Hungarian algorithm and blossom algorithm in graph theory. Data analysis is done using programming language R and Python.MethodsWe first quantify edge-weights of the miRNA-mRNA pairs by combining their expression correlation coefficient in tumor (T_CC) and correlation coefficient in normal (N_CC). We thereby introduce a bipartite graph partition procedure to identify cluster candidates. Specifically, we propose six weight formulas to quantify the change of miRNA-mRNA expression T_CC relative to N_CC, and apply the traditional hierarchical clustering to subjectively evaluate the different weight formulas of miRNA-mRNA pairs. Among these six different weight formulas, we choose the optimal one, which we define as the integrated mean value weights, to represent the connections between miRNA and mRNAs. Then the Hungarian algorithm and the blossom algorithm are employed on the miRNA-mRNA bipartite graph to passively determine the clusters. The combination of Hungarian and the blossom algorithms is dubbed maximum weighted merger method (MWMM).ResultsMWMM identifies clusters of different sizes that meet the mathematical criterion that internal connections inside a cluster are relatively denser than external connections outside the cluster and biological criterion that the intra-cluster Gene Ontology (GO) term similarities are larger than the inter-cluster GO term similarities. MWMM is developed using breast invasive carcinoma (BRCA) as training data set, but can also applies to other cancer type data sets. MWMM shows advantage in GO term similarity in most cancer types, when compared to other algorithms.ConclusionsmiRNAs and mRNAs that are likely to be affected by common underlying causal factors in cancer can be clustered by MWMM approach and potentially be used as candidate biomarkers for different cancer types and provide clues for targets of precision medicine in cancer treatment.
Highlights
MicroRNA is a short RNA (~ 22 nt) that regulates gene expression at the posttranscriptional level
A maximum weighted merger method (MWMM) pipeline that consists of continuous iterations of Hungarian algorithm and several rounds of blossom algorithm was used to passively cluster the miRNAmRNA pairs based on the maximum weighted edge matching in the bipartite graph and general graph
If the miRNA-mRNA pairs with large edge weights fall into more disjoint small clusters, there will be a larger number of disjoint clusters, which suggest that there is no coordinated interaction within expressional correlation coefficient changes from in normal to in tumor
Summary
MicroRNA (miRNA) is a short RNA (~ 22 nt) that regulates gene expression at the posttranscriptional level. We conduct clustering analysis of miRNA and mRNA using expression data from the Cancer Genome Atlas (TCGA). Transcriptional as well as other alterations can be comprehensively identified from generation sequencing (NGS) data of samples collected from tumorous tissue and normal adjacent tissue in the same patients suffering from a specific type of cancer. Those data are accumulated and organized by different projects such as International Cancer Genome Consortium (ICGC) [1], Encyclopedia of DNA Elements (ENCODE) [2], and the Cancer Genome Atlas (TCGA) [3]. TCGA data has been used to characterize key genomic changes, find novel mutations, define intrinsic tumor types, discover similarities and differences across cancer types, reveal therapy resistance mechanisms, and collect tumor evolution evidence [3]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.