Abstract

The exercise of combining classifiers is primarily driven by the desire to enhance the performance of a classification system. There may also be problem-specific rationale for integrating several individual classifiers. For example, a designermay have access to different types of features from the same study participant. For instance, in the human identification problem, the participant’s voice, face, and handwriting provide different types of features. In such instances, it may be sensible to train one classifier on each type of feature (Jain et al., 2000). In other situations, there may be multiple training sets, collected at different times or under slightly different circumstances. Individual classifiers can be trained on each available data set (Jain et al., 2000; Xu et al., 1992). Lastly, the demand for classification speed in online applications may necessitate the use of multiple classifiers (Jain et al., 2000). Optimal combination of multiple classifiers is a well-studied area. Traditionally, the goal of these methods is to improve classification accuracy by employing multiple classifiers to address the complexity and non-uniformity of class boundaries in the feature space. For example, classifiers with different parameter choices and architectures may be combined so that each classifier focuses on the subset of the feature space where it performs best. Well-known examples of these methods include bagging (Breiman, 1996a) and boosting (Bauer & Kohavi, 1999). Given the universal approximation ability of neural networks such as multilayer perceptrons and radial basis functions(Haykin, 1994), there is theoretical appeal to combine several neural network classifiers to enhance classification. Indeed, several methods have been developed for this purpose, including, for example, optimal linear combinations (Ueda, 2002) and mixture of experts (Jacobs et al., 1991), and negative correlation (Chen & Yao, 2009) and evolving neural network ensembles (Yao & Islam, 2008). In these methods, all base classifiers are generally trained on the same feature space (either using the entire training set or subsets of the training set). While these methods have proven effective in many applications, they are associated with numerical instabilities and high computational complexity in some cases (Bauer & Kohavi, 1999). 8

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call