Abstract

The paper proposes a solution to the problem classification by calculating the sequence of matrices of feature indices that approximate invariants of the data matrix. Here the feature index is the index of interval for feature values, and the number of intervals is a parameter. Objects with the equal indices form granules, including information granules, which correspond to the objects of the training sample of a certain class. From the ratios of the information granules lengths, we obtain the frequency intervals of any feature that are the same for the appropriate objects of the control sample. Then, for an arbitrary object, we find object probability estimation in each class and then the class of object that corresponds to the maximum probability. For a sequence of the parameter values, we find a converging sequence of error rates. An additional effect is created by the parameters aimed at increasing the data variety and compressing rare data. The high accuracy and stability of the results obtained using this method have been confirmed for nine data set from the UCI repository. The proposed method has obvious advantages over existing ones due to the algorithm’s simplicity and universality, as well as the accuracy of the solutions.

Highlights

  • The feature index is the index of interval for feature values, and the number of intervals is a parameter

  • The classification problem is the central problem in machine learning, and methods for solving it are dealt with in a considerable number of research papers, which is constantly growing

  • Granular computing is a paradigm of research in the field of artificial intelligence

Read more

Summary

Introduction

The classification problem is the central problem in machine learning, and methods for solving it are dealt with in a considerable number of research papers, which is constantly growing. The existing methods have been unable to consider these factors as they use mathematical tools within the framework of formalization of pure mathematics Such approaches have another drawback: in solving the problem, one must proceed from the assumption of existence of a metric in the feature space and a probability density function for the objects of each class. Studies in recent decades regarding the principles of information processing in complex systems open up new possibilities for solving the problem and eliminate these gaps in the theory Most of these works were triggered by soft computing theory and were based on the concept of an organism as a granular system. Granular computing is a paradigm of research in the field of artificial intelligence It covers multiple process modeling concepts of information processing in various hierarchical systems, as well as new approaches to learning with fuzzy databases [8] [9]. The method has been cross-checked on nine data sets from the UCI repository [16]

Transformation of Original Information
Design Formulas
Existence and Accuracy of the Solution
The Results of Solving Particular Tasks
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.