Abstract

Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system that implemented multiple correlations among four consequential layers of genetic data (disease-related loci, endophenotypes, phenotypes, and one disease trait). When information of one layer was blocked and uncertainty was created in the correlations among these layers, the correlation between the first and last layers (susceptibility genes and the disease trait in this case), was not easily directly detected. In this study, we proposed a two-stage process that applied rough set theory and decision trees to identify genes susceptible to the disease trait. During the first stage, based on phenotypes of subjects and their parents, decision trees were built to predict trait values. Phenotypes retained in the decision trees were then advanced to the second stage, where rough set theory was applied to discover the minimal subsets of genes associated with the disease trait. For comparison, decision trees were also constructed to map susceptible genes during the second stage. Our results showed that the decision trees of the first stage had accuracy rates of about 99% in predicting the disease trait. The decision trees and rough set theory failed to identify the true disease-related loci.

Highlights

  • Data mining approaches have been applied to different areas to derive useful and comprehensive knowledge

  • We proposed two-stage methods that utilize decision trees C4.5 and rough set theory to analyze the Genetic Analysis Workshop 14 (GAW14) simulated data

  • Phenotypes retained in the decision trees were advanced to the second stage where Rough set theory (RST) was applied to discover the minimal subsets of genes associated with the disease trait

Read more

Summary

Introduction

Data mining approaches have been applied to different areas to derive useful and comprehensive knowledge. Methods focusing on functionalities of data mining, such as classification, prediction, association, and clustering, have been developed [1]. Variants of decision trees, such as ID3 [2] and C4.5 [3], have become standard tools for classification [4,5]. Tree-based methods have been applied to genome-wide association studies for disease gene mapping [6]. Rough set theory [7] has been utilized to solve decision problem in business and industrial areas [8-10]. We proposed two-stage methods that utilize decision trees C4.5 and rough set theory to analyze the Genetic Analysis Workshop 14 (GAW14) simulated data. Our goal was to search genes susceptible to Kofendrerd Personality Disorder (KPD), a behavioral disorder with multiple possible phenotype definitions

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.