Local fuzzy rough attribute reduction for large-scale mixed data with limited missing labels based on local fuzzy self information

Zhaowen Li,Run Guo,Ning Lin,Tao Lu

doi:10.1016/j.ins.2024.121613

Abstract

The advent of the era of big data is accompanied by the generation of large-scale data of various types. Extracting the potential value and rules from such data has always been a challenge. Due to various external and internal factors, it is commonplace for large-scale data to exhibit the phenomenon of missing limited labels. In addressing a large-scale mixed information system with limited label missing (LSMDISLML), local neighborhood rough set model (LNRS-model) is typically employed. However, the identical neighborhood radius is often used by such model when confronted with numerical attributes, which could potentially attenuate the classification capability of the data. Local fuzzy rough set model (LFRS-model) can overcomes this point. This paper studies local fuzzy rough attribute reduction for large-scale mixed data with limited missing labels based on LFRS-model via local fuzzy self information and overlap degree function. First, leveraging the statistical distribution of data as a foundation, fuzzy relations on the entire sample set are established, which has the advantage of being able to use different fuzzy similarity radii to calculate similarity, thereby adapting to different data distributions. Subsequently, the samples with missing labels are discarded as they constitute a small proportion of the entire sample set and have little impact on overall performance of dataset. The limited computing resources and storage space are focused on the sample set with complete labels (denoted as target set). Thereafter, based on the target set, local fuzzy λ-upper and lower approximations are defined, and LFRS-model is constructed. This model not only reduces processing time and sources of error in large-scale data but also improves data quality and enhances the reliability of the experimental results. Then, local fuzzy λ-self information is introduced and used to design a local fuzzy rough attribute reduction algorithm in a LSMDISLML. Furthermore, a overlap degree function is introduced to evaluate and reorder the attributes based on their importance, prioritizing the elimination of redundant attributes with high overlap and low importance from the preordered attribute set. This strategy effectively improves the efficiency of obtaining the optimal subset. Finally, a series of experiments are carried out. The experiment results demonstrate that the designed algorithm exhibits excellent performance in classification tasks and outlier detection tasks, surpassing existing four algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Local fuzzy rough attribute reduction for large-scale mixed data with limited missing labels based on local fuzzy self information

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Similar Papers

Nonparametric data reduction approach for large-scale survival data analysis
Keivan Sadeghzadeh ... Nasser Fard
-
Keivan Sadeghzadeh, et. al.Keivan Sadeghzadeh ... Nasser Fard
01 Jan 2015
01 Jan 2015

Computational Recovery of Information From Low-quality and Missing Labels
Feng Bao
-
Feng BaoFeng Bao
01 Jan 2020
01 Jan 2020

EScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform
Jie Li ... Deb Agarwal
-
Jie Li, et. al.Jie Li ... Deb Agarwal
01 Jan 2009
01 Jan 2009

Addressing Imbalance in Weakly Supervised Multi-Label Learning
Fang-Fang Luo ... Wen-Zhong Guo
IEEE Access | VOL. 7
Fang-Fang Luo, et. al.Fang-Fang Luo ... Wen-Zhong Guo
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Local fuzzy rough attribute reduction for large-scale mixed data with limited missing labels based on local fuzzy self information

Abstract

Talk to us

Similar Papers

More From: Information Sciences