Data Mining Problems Research Articles

Data mining has actively contributed to solving many real-world problems with a variety of techniques. Traditional approaches in this field are classification, clustering and regression. During the last few years a number of chal-lenges have emerged, such as imbalanced data, multi-label and multi-instance problems, low quality and/or noisy data or semi-supervised learning, among others [item 1) in the Appendix]. When these non-standard scenarios are encountered in the realm of big data, it remains an uncharted research territory, although a growing effort has been made to break the limits. The current trend is to address the classical and newly emerging data mining problems in big data and knowledge processing. Granular computing provides a powerful tool for multiple granularity and multiple-view data analysis at differ-ent granularity levels, which has demonstrated strong capabil-ities and advantages in intelligent data analysis, pattern recog-nition, machine learning and uncertain reasoning [item 2) inthe Appendix]. Big data often contains a significant amount of unstructured, uncertain and imprecise data. There are new challenges regarding the scalability of granular computing when addressing very big data sets [item 3) in the Appendix]. Big data mining relies on distributed computational strate-gies; it is often impossible to store and process data on one single computing node. The exploration of data mining and granular computing in big data and knowledge processing is an emerging field which crosses multiple research disciplines and industry domains, including transportation, communications, social network, medical health, and so on.

Read full abstract

Discovering Erasable Patterns (EPs) consists of identifying product parts that will produce a small profit loss if their production is stopped. It is a data mining problem that has attracted the attention of numerous researchers in recent years due to the possibility of using EPs to reduce profit loss of manufacturers. Though, many algorithms have been designed to mine EPs, an important limitation of state-of-the-art EP mining algorithms is that they are batch algorithms, that is, they are designed to be applied on static databases. But in real-life applications, databases are dynamic, as they are constantly updated by adding or removing products and parts. To be informed about EPs in real-time, traditional EP mining algorithms must be applied over and over again on a database. This is inefficient as those algorithms are always applied from scratch without taking advantage of results generated by previous executions. Considering this important drawback of previous work for handling real-life dynamic data, this paper proposes an efficient algorithm named MSPPC for mining EPs in data streams. It relies on a novel tree structure named SPPC (Streaming Pre-Post Code) tree, which extends the WPPC tree structure for maintaining a compact tree representation of EPs in a data stream. Experimental results show that the designed MSPPC algorithm outperforms the state-of-the-art batch MERIT and dMERIT algorithms when they are run in batch mode using a sliding-window. Besides, the proposed algorithm is also faster than the state-of-the-art algorithms for mining EPs, namely MERIT, dMERIT + , MEI and EIFDD.

Read full abstract

Data Mining Problems Research Articles

Related Topics

Articles published on Data Mining Problems

A survey on big data: an emerging imparity and revolution in digital world

Behavior Action Mining

Fast approximate nearest neighbor search with the navigating spreading-out graph

Short-text learning in social media: a review

A New Application of Optimized Random Forest Algorithms in Intelligent Fault Location of Rudders

Usage of the rough set theory for generating decision rules of number of traffic vehicles

IEEE Access Special Section Editorial: Data Mining and Granular Computing in Big Data and Knowledge Processing

Location-Based Parallel Sequential Pattern Mining Algorithm

Record linkage based on a three-way decision with the use of granular descriptors

Exploiting Data Mining for Fast Inter Prediction Mode Decision in HEVC

High Utility Infrequent Itemset Mining Using a Customized Ant Colony Algorithm

Distributed Data Mining for Multiple Sourced Heterogeneous Datasets: A Survey

PyHIVE, a health-related image visualization and engineering system using Python

A novel distance measure for time series: Maximum shifting correlation distance

Data Mining Approach in Retail Knowledge Discovery and Internet Technologies

Evolutionary Propositionalization of Multi-Relational Data — Research Notes

A Novel Honey-Bees Mating Optimization Approach with Higher order Neural Network for Classification

SPPC: a new tree structure for mining erasable patterns in data streams

A high-performance clustering algorithm based on searched experiences

Spatio-Temporal Data Mining

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Data Mining Problems Research Articles

Related Topics

Articles published on Data Mining Problems

A survey on big data: an emerging imparity and revolution in digital world

Behavior Action Mining

Fast approximate nearest neighbor search with the navigating spreading-out graph

Short-text learning in social media: a review

A New Application of Optimized Random Forest Algorithms in Intelligent Fault Location of Rudders

Usage of the rough set theory for generating decision rules of number of traffic vehicles

IEEE Access Special Section Editorial: Data Mining and Granular Computing in Big Data and Knowledge Processing

Location-Based Parallel Sequential Pattern Mining Algorithm

Record linkage based on a three-way decision with the use of granular descriptors

Exploiting Data Mining for Fast Inter Prediction Mode Decision in HEVC

High Utility Infrequent Itemset Mining Using a Customized Ant Colony Algorithm

Distributed Data Mining for Multiple Sourced Heterogeneous Datasets: A Survey

PyHIVE, a health-related image visualization and engineering system using Python

A novel distance measure for time series: Maximum shifting correlation distance

Data Mining Approach in Retail Knowledge Discovery and Internet Technologies

Evolutionary Propositionalization of Multi-Relational Data — Research Notes

A Novel Honey-Bees Mating Optimization Approach with Higher order Neural Network for Classification

SPPC: a new tree structure for mining erasable patterns in data streams

A high-performance clustering algorithm based on searched experiences

Spatio-Temporal Data Mining