Abstract

We present a new classification algorithm for machine learning numerical data based on direct and inverse fuzzy transforms. In our previous work fuzzy transforms were used for numerical attribute dependency in data analysis: the multi-dimensional inverse fuzzy transform was used to approximate the regression function. Also here the classification method presented is based on this operator. Strictly speaking, we apply the K-fold cross-validation algorithm for controlling the presence of over-fitting and for estimating the accuracy of the classification model: for each training (resp., testing) subset an iteration process evaluates the best fuzzy partitions of the inputs. Finally, a weighted mean of the multi-dimensional inverse fuzzy transforms calculated for each training subset (resp., testing) is used for data classification. We compare this algorithm on well-known datasets with other five classification methods.

Highlights

  • In this paper we propose a new classification algorithm, called multi-dimensional F-transform classification, in which the direct and inverse multi-dimensional fuzzy transforms are used to classify instances

  • Our goal is to build a robust classification model based on the multi-dimensional fuzzy transform, which integrates a K-fold cross validation resampling method to overcome the problem of data over-fitting, which represents one of the major problems of classification models

  • We present our experiments on known sample of over 100 datasets extracted from UCI machine Learning repository and from Knowledge Extraction Evolution Learning (KEEL) dataset

Read more

Summary

Introduction

In this paper we propose a new classification algorithm, called multi-dimensional F-transform classification (for short, MFC), in which the direct and inverse multi-dimensional fuzzy transforms are used to classify instances. Our goal is to build a robust classification model based on the multi-dimensional fuzzy transform, which integrates a K-fold cross validation resampling method to overcome the problem of data over-fitting, which represents one of the major problems of classification models. The presence of under-fitting is evaluated by measuring a performance index after the learning process. Over-fitting represents a problem versus under-fitting and it occurs when a machine learning algorithm captures noise in the data and the data in the training dataset are optimally fitted, but the fitting itself is less accurate. There are two main techniques that we can use to limit overfitting in machine learning algorithms: to measure the accuracy by using a validation dataset or to adopt a resampling technique by using random subsets of data.

The MFC algorithm
Multi‐dimensional F‐transforms
Proposed algorithm
14 NEXT k
Tests and results
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.