Abstract

Spatial cluster detection is a classical question in epidemiology: Are cases located near other cases? In order to classify a study area into zones of different risks and determine their boundaries, we have developed a spatial partitioning method based on oblique decision trees, which is called spatial oblique decision tree (SpODT). This non-parametric method is based on the classification and regression tree (CART) approach introduced by Leo Breiman. Applied to epidemiological spatial data, the algorithm recursively searches among the coordinates for a threshold or a boundary between zones, so that the risks estimated in these zones are as different as possible. While the CART algorithm leads to rectangular zones, providing perpendicular splits of longitudes and latitudes, the SpODT algorithm provides oblique splitting of the study area, which is more appropriate and accurate for spatial epidemiology. Oblique decision trees can be considered as non-parametric regression models. Beyond the basic function, we have developed a set of functions that enable extended analyses of spatial data, providing: inference, graphical representations, spatio-temporal analysis, adjustments on covariates, spatial weighted partition, and the gathering of similar adjacent final classes. In this paper, we propose a new R package, SPODT, which provides an extensible set of functions for partitioning spatial and spatio-temporal data. The implementation and extensions of the algorithm are described. Function usage examples are proposed, looking for clustering malaria episodes in Bandiagara, Mali, and samples showing three different cluster shapes.

Highlights

  • Spatial cluster detection is a classical question in epidemiology: are cases located near other cases? Among various approaches, general methods allow us to detect high risk zones of unspecified locations within a study area, without specifying any a priori point source (Colonna, Esteve, and Menegoz 1993; Elliott, Martuzzi, and Shaddick 1995; Wakefield, Quinn, and Rabb 2001; Waller and Gotway 2004; Chirpaz, Colonna, and Viel 2004; Gaudart, Ramatriravo, and Giusiano 2006b)

  • We have introduced a spatial partitioning method based on oblique decision trees, called spatial oblique decision tree (SpODT), in order to classify a study area into zones of different risks and determine their boundaries, while being less sensitive to edge effects (Gaudart et al 2006b)

  • We have developed a set of functions for an extended analysis of spatial data, providing: inference, graphical representations, spatio-temporal analysis, adjustments on quantitative covariates, spatial weighted partition, and the gathering of similar adjacent final classes

Read more

Summary

Introduction

General methods allow us to detect high risk zones of unspecified locations within a study area, without specifying any a priori point source (Colonna, Esteve, and Menegoz 1993; Elliott, Martuzzi, and Shaddick 1995; Wakefield, Quinn, and Rabb 2001; Waller and Gotway 2004; Chirpaz, Colonna, and Viel 2004; Gaudart, Ramatriravo, and Giusiano 2006b). By scanning the study region with a circular or elliptic window, the SaTScan algorithm (Kulldorff 1997) compares observed and expected cases, inside and outside each potential cluster It has the advantage of not depending on the underlying spatial architecture, the choice of windowing is often critical and sensitive to edge effects (Gregorio, Samociuk, DeChello, and Swede 2006). The SPODT package is freely available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=SPODT

Basic algorithm
Program developments
Basic function
Hypothesis testing
Data examples
Different cluster shapes and levels
Spatial partition with a time covariate
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call