Distributed Data Mining Research Articles

Credit card transactions continue to grow in number, taking an ever-larger share of the US payment system and leading to a higher rate of stolen account numbers and subsequent losses by banks. Improved fraud detection thus has become essential to maintain the viability of the US payment system. Banks have used early fraud warning systems for some years. Large scale data-mining techniques can improve the state of the art in commercial practice. Scalable techniques to analyze massive amounts of transaction data that efficiently compute fraud detectors in a timely manner is an important problem, especially for e-commerce. Besides scalability and efficiency, the fraud-detection task exhibits technical problems that include skewed distributions of training data and nonuniform per error, both of which have not been widely studied in the knowledge-discovery and data mining community. In this article, we survey and evaluate a number of techniques that address these three main issues concurrently. Our proposed methods of combining multiple learned fraud detectors under a cost model are general and demonstrably useful; our empirical results demonstrate that we can significantly reduce loss due to fraud through distributed data mining of fraud models.

Read full abstract

Java has become a language of choice for applications executing in heterogeneous environments utilising distributed objects and multithreading. To handle large data sets, scalable and efficient implementations of data mining approaches are required, generally employing computationally intensive algorithms. Conventional Java implementations do not directly provide support for the data structures often encountered in such algorithms, and they also lack repeatability in numerical precision across platforms. This paper describes a distributed framework employing task and data parallelism, and implemented in high performance Java (HPJava). Issues of interest for data mining algorithms are identified, and possible solutions discussed for overcoming limitations in the Java Virtual Machine. The framework supports parallelism across workstation clusters, using the message-passing interface as middleware, and can support different analysis algorithms, wrapped as Java objects, and linked to various databases using the Java database connectivity interface. Guidelines are provided for implementing parallel and distributed data mining on large data sets, and a proof-of-concept data mining application is analysed using a neural network.

Read full abstract

Distributed Data Mining Research Articles

Related Topics

Articles published on Distributed Data Mining

Communication-efficient distributed mining of association rules

Application of Parallel and Distributed Data Mining in e-Commerce

10.1109/DSDE.2010.25

Distributed data mining in credit card fraud detection

The Distributed Data-Mining Worksho

A Distributed Framework for Parallel Data Mining Using HPJava

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Distributed Data Mining Research Articles

Related Topics

Articles published on Distributed Data Mining

Communication-efficient distributed mining of association rules

Application of Parallel and Distributed Data Mining in e-Commerce

10.1109/DSDE.2010.25

Distributed data mining in credit card fraud detection

The Distributed Data-Mining Worksho

A Distributed Framework for Parallel Data Mining Using HPJava