Abstract

AbstractKernel methods have become standard tools for solving classification and regression problems in statistics. An example of a kernel based classification method is Kernel Fisher discriminant analysis (KFDA). Conceptually KFDA entails transforming the data in the input space to a high-dimensional feature space, followed by linear discriminant analysis (LDA) performed in feature space. Although the resulting classifier is linear in feature space, it corresponds to a non-linear classifier in input space. However, as in the case of LDA, the classification performance of KFDA deteriorates in the presence of influential data points. Louw et al. (Communications in Statistics: Simulation and Computation 37:2050–2062, 2008) proposed several criteria for identification of influential cases in KFDA. In extensive simulation studies these criteria have been found to be successful, in the sense that the error rate of the KFD classifier based on the data set after removal of influential cases, is lower than the error rate of the KFD classifier based on the entire data set. A disadvantage is that these criteria are calculated on a leave-one-out basis, which becomes computationally expensive when dealing with large data sets. In this paper we propose a two-step procedure for identifying influential cases in large data sets. Firstly, a subset of potentially influential data cases is found by constructing the smallest enclosing hypersphere (for each group) in feature space. Secondly, the proposed criteria are employed to identify influential cases, but only cases in the subset are considered on a leave-one-out basis, leading to a substantial reduction in computation time. We investigate the merit of this new proposal in a simulation study, and compare the results to the results obtained when not using the hypersphere as a first step. We conclude that the new proposal has merit.KeywordsClassificationDiscriminant analysisKernel methods

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.