Abstract
We present a new classification algorithm for machine learning numerical data based on direct and inverse fuzzy transforms. In our previous work fuzzy transforms were used for numerical attribute dependency in data analysis: the multi-dimensional inverse fuzzy transform was used to approximate the regression function. Also here the classification method presented is based on this operator. Strictly speaking, we apply the K-fold cross-validation algorithm for controlling the presence of over-fitting and for estimating the accuracy of the classification model: for each training (resp., testing) subset an iteration process evaluates the best fuzzy partitions of the inputs. Finally, a weighted mean of the multi-dimensional inverse fuzzy transforms calculated for each training subset (resp., testing) is used for data classification. We compare this algorithm on well-known datasets with other five classification methods.
Highlights
In this paper we propose a new classification algorithm, called multi-dimensional F-transform classification, in which the direct and inverse multi-dimensional fuzzy transforms are used to classify instances
Our goal is to build a robust classification model based on the multi-dimensional fuzzy transform, which integrates a K-fold cross validation resampling method to overcome the problem of data over-fitting, which represents one of the major problems of classification models
We present our experiments on known sample of over 100 datasets extracted from UCI machine Learning repository and from Knowledge Extraction Evolution Learning (KEEL) dataset
Summary
In this paper we propose a new classification algorithm, called multi-dimensional F-transform classification (for short, MFC), in which the direct and inverse multi-dimensional fuzzy transforms are used to classify instances. Our goal is to build a robust classification model based on the multi-dimensional fuzzy transform, which integrates a K-fold cross validation resampling method to overcome the problem of data over-fitting, which represents one of the major problems of classification models. The presence of under-fitting is evaluated by measuring a performance index after the learning process. Over-fitting represents a problem versus under-fitting and it occurs when a machine learning algorithm captures noise in the data and the data in the training dataset are optimally fitted, but the fitting itself is less accurate. There are two main techniques that we can use to limit overfitting in machine learning algorithms: to measure the accuracy by using a validation dataset or to adopt a resampling technique by using random subsets of data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Ambient Intelligence and Humanized Computing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.