Abstract
In the age of big data, lots of data obtained is low-quality data characterized by heterogeneousness and incompleteness, referred to as heterogeneous incomplete decision systems (HIDSs) in this paper. Data classification is an important task in machine learning, with the ability to discover valuable knowledge hidden in HIDSs. However, systematic studies on data classification in HIDSs are rarely reported. Especially, there is a lack of adaptive classification methods for HIDSs, which can deal directly with heterogeneous incomplete data and do not require prior discretization of numerical attributes or filling in missing values. In this paper, a unified representation model, called parameterized tolerance granulation model (PTGM), is proposed to deal with heterogeneous incomplete data. And the principle of an adaptive granulation method of constructing appropriate PTGMs is also described using difference-based collaborative optimization. Based on PTGMs, decision logic language is used to describe classifiers consisting of decision rules satisfying given conditions. Then, a discernibility function-based and a heuristic function-based classification methods are proposed to obtain all optimized rule sets (classifiers) and to generate a particular optimized rule set, respectively. The heuristic function-based method is actually an adaptive classification method, which can deal directly with heterogeneous incomplete data. Furthermore, detailed theoretical analyses are given to illustrate the correctness and effectiveness of the proposed methods. The experimental results show that the proposed methods are effective and have obvious advantages in directly handling heterogeneous incomplete data.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have