Abstract
Categorical data is a significant kind of data in machine learning.Generally, rough set theory (RS-theory) deals with categorical data in the following way.First, an equivalence relation based on the equality of attribute values of categorical data is established.Then, information granules (I-granules) based on equivalence classes are obtained.Finally, information structures (I-structures) consisting of I-granules are formed.However, an equivalence relation is too strict, and there are some limitations in the I-structure of a categorical information system (CIS) that may result in filtering out potentially useful information.This paper investigates fuzzy information structures (FI-structures) and new uncertainty measurements for categorical data from the perspective that “the equality of attribute values is fed back to the attribute set”.First, a fuzzy symmetry relation based on the number of attributes with equal attribute values is established. Then, fuzzy information granules (FI-granules) based on the fuzzy symmetry relation are obtained. Next, FI-structures consisting of FI-granules are formed.Finally, some concepts related to FI-structures in a CIS are given.The set vector is used to denote FI-structures, and the inclusion degree is used to study the dependence between FI-structures.In addition, four new uncertainty measurements based on FI-structures in a CIS are proposed, including fuzzy information granulation (Gf), fuzzy information entropy (Hf), fuzzy rough entropy (Erf) and fuzzy information amount (Ef).Moreover, numerical experiments and statistical tests to evaluate the performance of the proposed new measurements are carried out.The results of the paired t-test show that the performance of the four new measurements based on FI-structures is better than that of the corresponding four measurements based on I-structures.Finally, attribute reduction algorithms based on Gf and Hf are presented, and clustering analysis is conducted on the reduced CIS. The experimental results show that the proposed algorithms are effective and perform well on attribute reduction according to three evaluation indicators of clustering performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.