Abstract

Interval-valued data (IVD) is a kind of data where each feature is an interval, and embeds the uncertainty and variability information. However, the missing values (lower or upper bound, or both of them are missed) may occur in the process of data acquisition and transmission, which may lead to obstacles for data processing. To obtain good results, it is important for IVD to process (often ignore or fill) the missing values. A dataset including missing values is named as incomplete interval-valued (IIV) set here. Some ignoring and filling methods for numeric or symbolic data have been proposed, but they cannot be applied for IIV datasets directly. In this work, a reliable k-nearest neighbor approach (RKNN) for incomplete interval-valued data (IIVD) is proposed. A combining rule to determine whether a datum including missing values should be ignored or filled is designed. Those samples with the missing value for each feature will be ignored directly. It is different from existing ignoring methods that need to set the percentage of missing entries. For the rest of missing samples, they will be filled according to their K complete nearest neighbors, which can ensure the filled value more reliable. In so doing, RKNN can exclude a small number of missing samples that may increase uncertainty, and avoid the repetition of the filled values (like median or a fixed constant). The experiment results on 12 synthetic datasets and 4 real-world datasets demonstrate that the proposed method can process the incomplete interval-valued data effectively, and obtain a good classification performance simultaneously.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.