Abstract

BackgroundThe dataset from genes used to predict hepatitis C virus outcome was evaluated in a previous study using a conventional statistical methodology.ObjectiveThe aim of this study was to reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied.MethodsWe built predictive models using different subsets of factors, selected according to their importance in predicting patient classification. We then evaluated each independent model and also a combination of them, leading to a better predictive model.ResultsOur data mining approach identified genetic patterns that escaped detection using conventional statistics. More specifically, the partial decision trees and ensemble models increased the classification accuracy of hepatitis C virus outcome compared with conventional methods.ConclusionsData mining can be used more extensively in biomedicine, facilitating knowledge building and management of human diseases.

Highlights

  • Univariate and multivariate analysis are the two main conventional approaches to statistical analysis in the scientific method

  • We studied whether haplotypes of the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) improved the predictive capacity of the interferon lambda–3 (IFNL3) genotype and found that different combinations of these genes (HLA-B44, HLA-C12, and KIR3DS1), together with the IFNL3 genotype, increased the classification accuracy of hepatitis C virus (HCV) outcome

  • We focused first on the partial decision trees (PART)-1 model, which was constructed with just 4 different features: IFNL3, HLA-B*44, KIR2DS1, and KIR3DS1 (Multimedia Appendix 1)

Read more

Summary

Introduction

Univariate and multivariate analysis are the two main conventional approaches to statistical analysis in the scientific method. Multivariate analysis in particular is used to determine the contribution of several factors (risk factors in biomedicine) to a single event or result. Genome-wide association studies (GWAS) have been widely used in case-control settings to identify which genetic variants, known as single nucleotide polymorphisms (SNPs), are associated with human diseases or traits [1,2]. A number of studies have performed univariate and multivariate analyses based on the results of GWAS in order to obtain new risk or protective factors. The 2017 study by our group using this method analyzed two groups of patients diagnosed with hepatitis C virus (HCV) infection [3]. The dataset from genes used to predict hepatitis C virus outcome was evaluated in a previous study using a conventional statistical methodology

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call