Abstract

Ensembles based on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> NN models are considered effective in reducing the adverse effect of outliers, primarily, by identifying the closest observations to a test point in a given training data. Class label of the test point is estimated by taking a majority vote of the nearest observations’ class labels. While identifying the closest observations, certain training patterns might possess high regulatory power than the others. Therefore, assigning weights to observations and then calculating weighted distances are deemed important. This paper proposes a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> NN ensemble that identifies nearest observations based on their weighted distance in relation to the response variable via support vectors. This is done while building a large number of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> NN models each on a bootstrap sample from the training data along with a randomly selected subset of features from the given feature space. The estimated class of the test observation is decided via majority voting based on the estimates given by all the base <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> NN models.The ensemble is assessed on 14 benchmark and simulated datasets against other classical methods, including <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> NN based models using Brier score, classification accuracy and Kappa as performance measures. Average values of the given measures along with boxplots are given for the proposed and the other competitors. On both benchmark and simulated datasets, the proposed ensemble outperformed the other competitive methods in majority of the cases. The suggested ensemble gave overall better classification performance than the other methods on 8 datasets. The ensemble performed better than the other methods in simulation study that considered the presence of noisy features in a dataset. The analyses reveal that the proposed method is effective in classification problems that involve noisy features in the data. Furthermore, feature weighting and randomization also make the method robust to the choice of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> , i.e., the number of nearest observations in a base model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.