Abstract

The Evidential K-Nearest Neighbor (EK-NN) classification rule provides a global treatment of imperfect knowledge in class labels, but still suffers from the curse of dimensionality as well as runtime and memory restrictions when performing nearest neighbors search, in particular for large and high-dimensional data. To avoid the curse of dimensionality, this article first proposes a rough evidential K-NN (REK-NN) classification rule in the framework of rough set theory. Based on a reformulated K-NN rough set model, REK-NN selects features and thus reduces complexity by minimizing a proposed neighborhood pignistic decision error rate, which considers both Bayes decision error and spatial information among samples in feature space. In contrast to existing rough set-based feature selection methods, REK-NN is a synchronized rule rather than a stepwise one, in the sense that feature selection and learning are performed simultaneously. In order to further handle data with large sample size, we derive a distributed REK-NN method and implement it in the Apache Spark. The theoretical analysis of the classifier generalization error bound is finally presented. It is shown that the distributed REK-NN achieves good performances while drastically reducing the number of features and consuming less runtime and memory. Numerical experiments conducted on real-world datasets validate our conclusions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.