Abstract

Although modern methods of whole genome DNA methylation analysis have a wide range of applications, they are not suitable for clinical diagnostics due to their high cost and complexity and due to the large amount of sample DNA required for the analysis. Therefore, it is crucial to be able to identify a relatively small number of methylation sites that provide high precision and sensitivity for the diagnosis of pathological states. We propose an algorithm for constructing limited subsamples from high-dimensional data to form diagnostic panels. We have developed a tool that utilizes different methods of selection to find an optimal, minimum necessary combination of factors using cross-entropy loss metrics (LogLoss) to identify a subset of methylation sites. We show that the algorithm can work effectively with different genome methylation patterns using ensemble-based machine learning methods. Algorithm efficiency, precision and robustness were evaluated using five genome-wide DNA methylation datasets (totaling 626 samples), and each dataset was classified into tumor and non-tumor samples. The algorithm produced an AUC of 0.97 (95% CI: 0.94–0.99, 9 sites) for prostate adenocarcinoma and an AUC of 1.0 (from 2 to 6 sites) for urothelial bladder carcinoma, two types of kidney carcinoma and colorectal carcinoma. For prostate adenocarcinoma we showed that identified differential variability methylation patterns distinguish cluster of samples with higher recurrence rate (hazard ratio for recurrence = 0.48, 95% CI: 0.05–0.92; log-rank test, p-value < 0.03). We also identified several clusters of correlated interchangeable methylation sites that can be used for the elaboration of biological interpretation of the resulting models and for further selection of the sites most suitable for designing diagnostic panels. LogLoss-BERAF is implemented as a standalone python code and open-source code is freely available from https://github.com/bioinformatics-IBCH/logloss-beraf along with the models described in this article.

Highlights

  • Prostate cancer (PC) is one of the most frequently diagnosed oncological diseases in males worldwide [1]

  • The aim of this study was to develop a framework for selection of a limited number of diagnostically informative DNA methylation sites and to estimate its potential diagnostic efficiency

  • Since one of the promising current trends is non-invasive PC diagnostics based on DNA methylation markers obtained from urine samples [26,27], we analyzed methylation data for urothelial bladder carcinoma (BLCA), kidney renal clear cell carcinoma (KIRC), and kidney renal papillary cell carcinoma (KIRP)

Read more

Summary

Introduction

Prostate cancer (PC) is one of the most frequently diagnosed oncological diseases in males worldwide [1]. The early stages of PC are characterized by an asymptomatic course, which substantially impedes its early diagnosis [2]. The latest experimental data have clarified the role of genetic and epigenetic factors in PC pathogenesis [4]. Among these factors, epigenetic alterations, aberrant DNA methylation of CpG dinucleotides in genes, are of special interest. Epigenetic alterations, aberrant DNA methylation of CpG dinucleotides in genes, are of special interest These alterations are often functionally related to the expression regulation of tumor suppressors and oncogenes at early stages of both prostate cancer and other types of oncological diseases [5,6]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.