Abstract

MicroRNAs are small non-coding RNAs that influence gene expression by binding to the 3’ UTR of target mRNAs in order to repress protein synthesis. Soon after discovery, microRNA dysregulation has been associated to several pathologies. In particular, they have often been reported as differentially expressed in healthy and tumor samples. This fact suggested that microRNAs are likely to be good candidate biomarkers for cancer diagnosis and personalized medicine. With the advent of Next-Generation Sequencing (NGS), measuring the expression level of the whole miRNAome at once is now routine. Yet, the collaborative effort of sharing data opens to the possibility of population analyses. This context motivated us to perform an in-silico study to distill cancer-specific panels of microRNAs that can serve as biomarkers. We observed that the problem of finding biomarkers can be modeled as a two-class classification task where, given the miRNAomes of a population of healthy and cancerous samples, we want to find the subset of microRNAs that leads to the highest classification accuracy. We fulfill this task leveraging on a sensible combination of data mining tools. In particular, we used: differential evolution for candidate selection, component analysis to preserve the relationships among miRNAs, and SVM for sample classification. We identified 10 cancer-specific panels whose classification accuracy is always higher than 92%. These panels have a very little overlap suggesting that miRNAs are not only predictive of the onset of cancer, but can be used for classification purposes as well. We experimentally validated the contribution of each of the employed tools to the selection of discriminating miRNAs. Moreover, we tested the significance of each panel for the corresponding cancer type. In particular, enrichment analysis showed that the selected miRNAs are involved in oncogenesis pathways, while survival analysis proved that miRNAs can be used to evaluate cancer severity. Summarizing: results demonstrated that our method is able to produce cancer-specific panels that are promising candidates for a subsequent in vitro validation.

Highlights

  • Timing and accuracy in cancer diagnosis are among the most critical factors that influence the clinical history of a patient

  • These results can be interpreted as an empirical justification of our choice of using a feature selection method able to identify non-linear correlations as well as that of addressing the identification of putative miRNA biomarkers as a classification problem instead of a clustering one

  • Our approach consists of a sensible mix of existing tools for data mining based on the observation that the problem of finding biomarkers can be mapped into a two-class classification task

Read more

Summary

Introduction

Timing and accuracy in cancer diagnosis are among the most critical factors that influence the clinical history of a patient. The histological analysis of a small sample of tumor cells has been the only tool for cancer classification. The complexity of this pathology and the histological similarity of certain sub-classes, have motivated researchers to find easier diagnosis techniques that can be used on a large scale [1]. The relationship between cellular and circulating miRNAs has already been elucidated [8]. These facts open to a new generation of miRNA-based non-invasive biomarkers [9]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.