Abstract

<p>A new method for the big data analysis - multi-granularity generalized functions data model (referred to as MGGF for short) is put forward. This method adopts the dynamic adaptive multi-granularity clustering technique, transforms the grid like "Hard partitioning" to the input data space by the generalized functions data model (referred to as GFDM for short) into the multi-granularity partitioning, and identifies the multi-granularity pattern class in the input data space. By defining the type of the mapping relationship between the multi-granularity model class and the decision-making category ftype:Ci→y, and the concept of the Degree of Fulfillment (referred to as DoF (x)) of the input data to the classification rules of the various pattern classes, the corresponding MGGF model is established. Experimental test results of different data sets show that, compared with the GFDM method, the method proposed in this paper has better data summarization ability, stronger noise data processing ability and higher searching efficiency.</p>

Highlights

  • In the knowledge discovery in database and data mining, there are usually two basic problems existing, that is, the selection of the appropriate data expression and the formalization of model indexes

  • In order to be able to evaluate the effect of the multi-granularity clustering on the data set in the input space effectively, we assume that there is a one-to-one or manyto-one relationship between the pattern class in the input space and the class value of the decision variable, instead of the many-to-many relationship, that is, there is no such case where the same input pattern class belongs to multiple decision categories at the same time (Note: It is considered that such situation can be solved by adopting the supervised multi-granularity clustering method in the input-output product space, which has been elaborated in a separate paper), which is consistent with most of the actual situations

  • We find that: (1) For the MGGF model established by the method put forward in this paper, its cumulative model classification accuracy cma does not have significant difference when compared with that of the generalized functions data model (GFDM) method and the C4.5 method

Read more

Summary

Introduction

In the knowledge discovery in database (referred to as KDD for short) and data mining (referred to as DM for short), there are usually two basic problems existing, that is, the selection of the appropriate data expression and the formalization of model indexes. The concept of the generalized functions data model (referred to as GFDM for short) [6] has been put forward, which attempts to describe a certain subset concept X in the universe of discourse through the establishment of the mapping relation type:C!X between the equivalence class in the universe of discourse and the decision class. On the basis of the VPM and GFDM, a new big data analysis method - multi-granularity generalized functions data model (referred to as MGGF for short) is put forward, which transforms the equivalent partition of the VPM and GFDM to the data space into the multi-granularity clustering classification, so as to transform the solving process of the optimal GFDM model into the optimization process of the classification objective function, which effectively avoids the NP difficult problem. The results of the partitioning can achieve the optimum in the whole domain of discourse space of the data set according to the different classes of the various patterns, and has very strong data generalization ability, which is not sensitive to the noise, overcoming the defects that the generalization ability of the GFDM model depends heavily on its prior partitioning of the domain of discourse space

Multi-granularity Functions Data Model
Adaptive Dynamic Multi-granularity Partitioning of the Input Data Space
Definition of Multi-granularity Functions Data Model
Characteristics of the Multi-granularity Functions Data Model
Studies of the Experimental Data
Conclusion
Findings
Authors
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call