Global society has experienced a flood of various types of data, as well as a growing desire to discover and use this information effectively. Moreover, this data is changing in increasingly numerous and complex ways. In particular, for data that is generated intermittently, attention has been focused on data streams that use sensor network and stream mining technologies to discover useful information. In this paper, we focus on classification learning, which is an analytical method of stream mining. We are concerned with a type of decision tree learning called the Very Fast Decision Tree (VFDT) learner, which regards real data as a data stream. We analyze credit card transaction data as a data stream and detect fraudulent use. In recent years, credit card users have increased. However, this also consequently increases the damage caused by fraudulent use. Therefore, the detection of fraudulent use by data stream mining is required. However, some data, such as credit card transaction data, is extremely different from the rate of classes. Therefore, we propose and implement new statistical criteria to be used in a node construction algorithm that implements the VFDT. We also evaluate whether this method can be applied to imbalanced distribution data streams. Recent developments in information processing techniques have enabled us to accumulate largescale data. The need for discovering and utilizing useful information in this data is growing. Because of this, data mining, which is a technology used to collect data to discover useful information, has attracted considerable attention. However, with the spread of the Internet and the development of sensor techniques, the complexity of this data is constantly changing, and the increasing amounts of data must be handled on a real-time basis. New knowledge-streammining techniques are required to process such large-scale data that arrives intermittently and at different intervals as data stream flows. Stream mining uses various analytical methods; in particular, classification learning is gaining considerable attention. Many classification learning methods have been proposed among which the decision tree learning method is commonly used, because it is fast and the derived description of classifiers is easily interpreted. One of the data streams that supports the decision tree learning method is called the Very Fast Decision Tree (VFDT) [1]. As data arrives, this data stream grows gradually while the data is classified. Credit card transaction data is considered as the data stream. Therefore, it is possible to detect fraudulent use by classifying transaction data using the VFDT. However, among the various data types, there are some data, such as the credit card transaction data discussed in this study, whose characteristics are extremely different. When such data is used in a data stream, some problems can reduce the accuracy of the VFDT [2, 3]. In this study, we propose a node construction algorithm that is applicable to imbalanced distribution data streams. We also implement and evaluate criteria for constructing nodes. This paper is organized as follows. First, in Section 2, we explain the VFDT. In Section 3, we describe our proposed method, which consists of a VFDT construction from imbalanced distribution data streams. In Section 4, we verify the effectiveness of the proposed method by experiments. In Section 5, we describe and consider the experimental result. In the final section, we conclude and discuss our future works.
Read full abstract