Abstract
BackgroundSurvival analysis is an important element of reasoning from data. Applied in a number of fields, it has become particularly useful in medicine to estimate the survival rate of patients on the basis of their condition, examination results, and undergoing treatment. The recent developments in the next generation sequencing open new opportunities in survival study as they allow vast amount of genome-, transcriptome-, and proteome-related features to be investigated. These include single nucleotide and structural variants, expressions of genes and microRNAs, DNA methylation, and many others.ResultsWe present LR-Rules, a new algorithm for rule induction from survival data. It works according to the separate-and-conquer heuristics with a use of log-rank test for establishing rule body. Extensive experiments show LR-Rules to generate models of superior accuracy and comprehensibility. The detailed analysis of rules rendered by the presented algorithm on four medical datasets concerning leukemia as well as breast, lung, and thyroid cancers, reveals the ability to discover true relations between attributes and patients’ survival rate. Two of the case studies incorporate features obtained with a use of high throughput technologies showing the usability of the algorithm in the analysis of bioinformatics data.ConclusionsLR-Rules is a viable alternative to existing approaches to survival analysis, particularly when the interpretability of a resulting model is crucial. Presented algorithm may be especially useful when applied on the genomic and proteomic data as it may contribute to the better understanding of the background of diseases and support their treatments.
Highlights
Survival analysis is an important element of reasoning from data
Each observation is described by the following attributes: hormonal therapy, age, menopausal status, tumour size, tumour grade, number of positive nodes, progesterone
The experiments confirmed LR-Rules to perform significantly better than the KM estimator and to survival trees CTREE and RPART in terms of prediction error
Summary
Survival analysis is an important element of reasoning from data. Applied in a number of fields, it has become useful in medicine to estimate the survival rate of patients on the basis of their condition, examination results, and undergoing treatment. The recent developments in the generation sequencing open new opportunities in survival study as they allow vast amount of genome-, transcriptome-, and proteome-related features to be investigated. These include single nucleotide and structural variants, expressions of genes and microRNAs, DNA methylation, and many others. The application of machine learning to survival analysis usually allows overcoming the limitations of statistical methods. In this paper we investigate rule induction algorithm in combination with the log-rank statistical test [6] This nonparametric test is used to compare the survival distributions of two samples and is appropriate for censored data analysis. As the basis of rule induction method we selected a separate-and-conquer (known as covering)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.