Abstract

BackgroundAssociation Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics.MethodsWe introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint.ResultsApplying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules.ConclusionsQARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems.

Highlights

  • Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets

  • A potential issue of ARM, especially when directly used in large biomedical datasets, is that it will result in many uninteresting rules

  • We introduce query-constraint-based ARM (QARM), a query-constraintbased ARM method where the rules mined are based on a subset of data satisfying a certain query constraint

Read more

Summary

Introduction

Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. For imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics. Demographic features of patients (e.g., gender and race) always appear in biomedical datasets, which may result in an overwhelming number of gender-related association rules with high support and confidence, which are dominant but less interesting. Another potential challenge of performing ARM in biomedical datasets is the existence of semantically similar variables.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call