Abstract

Deciding whether a label in a dataset is noisy or not depends on factors such as filtering levels, data patterns, relabeling policies, and pre-defined regulations. By focusing on filtering acceptance levels (FAL), a decision support model (named DSM-ENL) is proposed and implemented on real datasets that are granulated, dealing with noise label detection and profiling effects of 50 FALs (51 %–100 %) on noise rate (NoR), classification accuracy (CA), and area under ROC curve (AUC). A case used to demonstrate the DSM-ENL is illustrated, followed by experiments on 27 unsupervised and 29 supervised granulated datasets using five classifiers with 70 %–30 % training and testing criteria. The findings revealed that: (1) the best CAs among classifiers vary with FALs; (2) NoR, CA, and AUC oscillate as FAL increases; (3) lower FAL boundaries (from 0.51 to 0.60) are more likely to lead to higher CAs; and (4) correlation of coefficient among dataset characteristics, FAL, NoR, CA, and AUC are not high compared to that of CA and NoR and that of CA and AUC. The research demonstrates the value of profiling the effects of FALs applied for noise-label datasets on learning performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call