Scalable Mining Algorithms Research Articles

In this paper, we propose a scalable mining algorithm to discover contextual outliers using relevant subspaces. We develop the mining algorithm using the MapReduce programming model running on a Hadoop cluster. Relevant subspaces, which effectively capture the local distribution of various datasets, are quantified using local sparseness of attribute dimensions. We design a novel way of calculating local outlier factors in a relevant subspace with the probability density of local datasets; this new approach can effectively reflect the outlier degree of a data object that does not satisfy the distribution of the local dataset in the relevant subspace. Attribute dimensions of a relevant subspace, and local outlier factors are expressed as vital contextual information, which improves the interpretability of outliers. Importantly, the selection of ${N}$ data objects with the largest local outlier factor value is categorized as contextual outliers in our solution. To this end, our scalable mining algorithm, which incorporates the locality sensitive hashing distributed strategy, is implemented on a Hadoop cluster. The experimental results validate the effectiveness, interpretability, scalability, and extensibility of the algorithm using both synthetic data and stellar spectral data as experimental datasets.

Read full abstract

An association rule is a method to find out the frequent hidden relationship from a large amount of datasets in a database. Association analysis into existing database technology is very useful for indexing and query processing capabilities of database system and developing efficient and scalable mining algorithms as well as handling user specified or domain specific constraints and post processing the extracted patterns. In the present work, a methodology known as association analysis is presented which is very useful for discovery of interesting relationship hidden in large dataset, and an algorithm for generation of frequent data item set known as Apriori algorithm is used and validated the relations through Unified Modeling Language (UML). Authors used the lattice structure and also discussed the various association rules for the frequent data itemset which is found by Apriori algorithm. The different strategies in generation and traversal are breadth first and depth first search traversal. These techniques provide different tradeoff in terms of the input and output memory and computational time requirements. The entire concept is implemented by considering a real case study of Vehicle Insurance Policy system (VIPS) in context of Indian scenario.

Read full abstract

Scalable Mining Algorithms Research Articles

Articles published on Scalable Mining Algorithms

Scalable Mining of Contextual Outliers Using Relevant Subspace

Discovery of Hidden Relationship in a Large Data Itemsets through Apriori Algorithm of Association Analysis with UML

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Scalable Mining Algorithms Research Articles

Articles published on Scalable Mining Algorithms

Scalable Mining of Contextual Outliers Using Relevant Subspace

Discovery of Hidden Relationship in a Large Data Itemsets through Apriori Algorithm of Association Analysis with UML