Case-based predictions for species and habitat mapping

Kalle Remm

doi:10.1016/j.ecolmodel.2004.03.004

Abstract

The specific feature of case-based predictions is the presence of empirical data within a predicting system. Case-based methods are especially well suited to situations where a single class is represented by more than one cluster, or a class fills an irregular shape in the feature space. Case-based ecological mapping relies on the assumption of finding a species (or other phenomenon) in locations similar to those where this species has already been registered. Lazy learning is a machine learning approach for fitting case-based prediction systems, which prefers raw data to generalizations. An overview of lazy learning methods: feature selection and weighting, fitting the number of exemplars or kernel extent used for decisions, indexing the case-base, learning new and forgetting useless exemplars, and exemplar weighting, is given. A case study of habitat and forest composition mapping was accomplished in Otepää Upland in South-East Estonia. Habitat classes were predicted and mapped on the whole study area; five characteristics of stand composition, presence/absence of Quercus robur, total coverage of forest stand, coverage of coniferous trees, and eight main tree species separately, were mapped on non-agricultural and non-settlement areas. The explanatory variables were derived from: Landsat 7 ETM image, greyscale and colour orthophotos, elevation model, 1:10 000 digital base map, and soil map. One thousand random locations were described in the field in order to obtain training data. Four methods of machine learning were compared. In calculations of similarity, exemplar and feature weights regulated both the influence of particular exemplars and features, and kernel extent. Goodness-of-fit of predictions was estimated using leave-one-out cross-validation. A machine learning method combining stepwise feature selection, feature weighting, and exemplar weighting reached the best results in the case of 10 response variables. A method involving iterative random sampling proved to be the best for the other seven variables. The best fit was found for variables: habitat class ( κ=0.85), oak presence/absence ( mean true positive + mean true negative −1=0.72 ), coverage of coniferous trees ( R 2=0.80), coverage of Pinus sylvestris ( R 2=0.72), and coverage of Picea abies ( R 2=0.73). In most cases, less than a half of training instances were retained as exemplars after case filtering, and less than half of the explanatory variables were used in predictive sets. All 31 explanatory variables were included in a predictive set of features at least once. The most valuable predictor was land cover category according to the 1:10,000 base map.

Full Text