Abstract

Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.

Highlights

  • Spatial scan statistic, which was introduced by Kulldorff[1], focuses on detecting the presence and locations of geographic clusters within spatial datasets

  • Ribeiro and Costa[5] investigated the performance of spatial scan statistics with different maximum spatial cluster sizes, including secondary clusters; they suggested that three performance measures are sensitive to the maximum spatial cluster size

  • Spatial scan statistics are widely used in different fields to identify unusual clustering events throughout the study region

Read more

Summary

Introduction

Spatial scan statistic, which was introduced by Kulldorff[1], focuses on detecting the presence and locations of geographic clusters within spatial datasets. The maximum spatial cluster size is the only parameter that must be selected by users to apply commonly used circular spatial scan statistics with SaTScan software. Performance measures at the aggregation level are commonly used over data sets generated with a similar underlying model because the former can detect slight differences among spatial scan statistics with different parameters. These datasets do not exist in reality. An overall performance measure based on applied dataset, rather than the known presence of true clusters, can be used to select the optimal spatial parameters for improving the performance of spatial scan statistics in applications.

Methods
Only the result of MCS-P is close to the optimal result
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call