Association rule mining algorithms on high-dimensional datasets

Dongmei Ai,Xiaoxin Li,Di He,Yingxin Gao,Hongfei Pan

doi:10.1007/s10015-018-0437-y

Abstract

The science of bioinformatics has been accelerating at a fast pace, introducing more features and handling bigger volumes. However, these swift changes have, at the same time, posed challenges to data mining applications, in particular efficient association rule mining. Many data mining algorithms for high-dimensional datasets have been put forward, but the sheer numbers of these algorithms with varying features and application scenarios have complicated making suitable choices. Therefore, we present a general survey of multiple association rule mining algorithms applicable to high-dimensional datasets. The main characteristics and relative merits of these algorithms are explained, as well, pointing out areas for improvement and optimization strategies that might be better adapted to high-dimensional datasets, according to previous studies. Generally speaking, association rule mining algorithms that merge diverse optimization methods with advanced computer techniques can better balance scalability and interpretability.

Highlights

Association rules mining (ARM), an important branch of data mining, has been extensively used in many areas since Agrawal first introduced it in 1993 [1]
ARM can be seen as a method aimed at discovering groups of items that co-occur with high frequency
A typical application of ARM on such high-throughput datasets is gene association analysis (GAA) [2, 3], in which the goal is to exploit the relationships among different genes based on corresponding expression levels

Summary

Introduction

Association rules mining (ARM), an important branch of data mining, has been extensively used in many areas since Agrawal first introduced it in 1993 [1]. In contrast to other data mining methods involved with statistical models, ARM can extract possible relationships between variables from. Data from these high-throughput techniques often share in common the feature of high dimensionality. The number of genes in a given study can be in the thousands, while the number of specimens is generally dozens or hundreds Such high dimensionality is true for other kinds of biomedical datasets, e.g., Operational Taxonomic Unit (OTU) abundance datasets that have different levels of extra environmental factors in metagenomics analysis [4], as well as multiple datasets, including mRNA/miRNA expression data and Copy Number Variations (CNV) data from The Cancer Genome Atlas (TCGA) project [5]. To improve performance brought by highdimensional datasets, multiple specialized algorithms have been proposed in the last decade

Basic association rule mining algorithms

Maximal frequent itemset mining and frequent closed itemset mining

Algorithms applicable to high‐dimensional datasets

Discussion

Methods

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Artificial Life and Robotics	Publication Date: May 30, 2018
Citations: 18	License type: open-access

R Discovery Prime

R Discovery Prime

Association rule mining algorithms on high-dimensional datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Artificial Life and Robotics

Lead the way for us

Similar Papers

A new algorithm of association rules mining
Gang Fang ... Jiang Xiong
-
Gang Fang, et. al.Gang Fang ... Jiang Xiong
01 Aug 2010
01 Aug 2010

RANWAR: rank-based weighted association rule mining from gene expression and methylation data.
Saurav Mallik ... Anirban Mukhopadhyay
IEEE Transactions on NanoBioscience | VOL. 14
Saurav Mallik, et. al.Saurav Mallik ... Anirban Mukhopadhyay
23 Sep 2014
IEEE Transactions on NanoBioscience | VOL. 14

Research on distributed data mining system and algorithm based on multi-agent /
Lingxia Jiang
-
Lingxia JiangLingxia Jiang
01 Jan 2009
01 Jan 2009

An artificial immune algorithm for association rule mining among concepts with uncertainty
Hongyu Di ... Sun'An Wang
-
Hongyu Di, et. al.Hongyu Di ... Sun'An Wang
01 Mar 2015
01 Mar 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Association rule mining algorithms on high-dimensional datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Artificial Life and Robotics