Comparing the performance of two indices for spatial model selection: application to two mortality data
The statistical analysis of spatially correlated data has become an important scientific research topic lately. The analysis of the mortality or morbidity rates observed at different areas may help to decide if people living in certain locations are considered at higher risk than others. Once the statistical model for the data of interest has been chosen, further effort can be devoted to identifying the areas under higher risks. Many scientists, including statisticians, have tried the conditional autoregressive (CAR) model to describe the spatial autocorrelation among the observed data. This model has greater smoothing effect than the exchangeable models, such as the Poisson gamma model for spatial data. This paper focuses on comparing the two types of models using the index LG, the ratio of local to global variability. Two applications, Taiwan asthma mortality and Scotland lip cancer, are considered and the use of LG is illustrated. The estimated values for both data sets are small, implying a Poisson gamma model may be favoured over the CAR model. We discuss the implications for the two applications respectively. To evaluate the performance of the index LG, we also compute the Bayes factor, a Bayesian model selection criterion, to see which model is preferred for the two applications and simulation data. To derive the value of LG, we estimate its posterior mode based on samples derived from the BUGS program, while for Bayes factor we use the double Laplace-Metropolis method, Schwarz criterion, and a modified harmonic mean for approximations. The results of LG and Bayes factor are consistent. We conclude that LG is fairly accurate as an index for selection between Poisson gamma and CAR model. When easy and fast computation is of concern, we recommend using LG as the first and less costly index.
- Research Article
8
- 10.3390/sym13040545
- Mar 26, 2021
- Symmetry
In spatial data analysis, the prior conditional autoregressive (CAR) model is used to express the spatial dependence on random effects from adjacent regions. This paper provides a new proposed approach regarding the development of the existing normal CAR model into a more flexible, Fernandez–Steel skew normal (FSSN) CAR model. This approach is able to capture spatial random effects that have both symmetrical and asymmetrical patterns. The FSSN CAR model is built on the basis of the normal CAR with an additional skew parameter. The FSSN distribution is able to provide good estimates for symmetry with heavy- or light-tailed and skewed-right and skewed-left data. The effects of this approach are demonstrated by establishing the FSSN distribution and FSSN CAR model in spatial data using Stan language. On the basis of the plot of the estimation results and histogram of the model error, the FSSN CAR model was shown to behave better than both models without a spatial effect and with the normal CAR model. Moreover, the smallest widely applicable information criterion (WAIC) and leave-one-out (LOO) statistical values also validate the model, as FSSN CAR is shown to be the best model used.
- Research Article
- 10.13189/ms.2025.130603
- Dec 1, 2025
- Mathematics and Statistics
Conditional Autoregressive (CAR) models have been widely used in various disciplines, including epidemiological studies. The application of the CAR model in epidemiological studies is often associated with the relative risk of an infectious disease. This relative risk value can be estimated using the CAR models. Here, we evaluate four commonly used CAR models: the Intrinsic CAR, the Besag-York-Mollié CAR (BYM CAR), the BYM-modified CAR (BYM2 CAR), and the Leroux CAR (LCAR). To estimate CAR model parameters, Bayesian inference and the Integrated Nested Laplace Approximation (INLA) concept are used. The selected model was then used to model the number of dengue hemorrhagic fever (DHF) cases in Central Java Province in 2024. To support this analysis, we used 50 datasets simulated for each sample size (n), ranging from 10 to 100. The results of the study showed that of the four models compared, the best model was BYM2. This model was then used to model DHF cases in 2024 in Central Java Province. The research findings indicate the necessity of controlling population density, optimizing the role of medical personnel, and preparing for increased rainfall to curb the spread of dengue fever. Comprehensive detection and control measures through medical facilities are also required. Meanwhile, based on the coefficient of the altitude variable in the model, altitude has a positive influence on the number of dengue fever cases. Therefore, the conflicting conclusions between the model results and the medical perspective require data verification and further study of this variable.
- Research Article
235
- 10.1002/ecm.1283
- Jan 23, 2018
- Ecological Monographs
Ecological data often exhibit spatial pattern, which can be modeled as autocorrelation. Conditional autoregressive (CAR) and simultaneous autoregressive (SAR) models are network‐based models (also known as graphical models) specifically designed to model spatially autocorrelated data based on neighborhood relationships. We identify and discuss six different types of practical ecological inference using CAR and SAR models, including: (1) model selection, (2) spatial regression, (3) estimation of autocorrelation, (4) estimation of other connectivity parameters, (5) spatial prediction, and (6) spatial smoothing. We compare CAR and SAR models, showing their development and connection to partial correlations. Special cases, such as the intrinsic autoregressive model (IAR), are described. Conditional autoregressive and SAR models depend on weight matrices, whose practical development uses neighborhood definition and row‐standardization. Weight matrices can also include ecological covariates and connectivity structures, which we emphasize, but have been rarely used. Trends in harbor seals (Phoca vitulina) in southeastern Alaska from 463 polygons, some with missing data, are used to illustrate the six inference types. We develop a variety of weight matrices and CAR and SAR spatial regression models are fit using maximum likelihood and Bayesian methods. Profile likelihood graphs illustrate inference for covariance parameters. The same data set is used for both prediction and smoothing, and the relative merits of each are discussed. We show the nonstationary variances and correlations of a CAR model and demonstrate the effect of row‐standardization. We include several take‐home messages for CAR and SAR models, including (1) choosing between CAR and IAR models, (2) modeling ecological effects in the covariance matrix, (3) the appeal of spatial smoothing, and (4) how to handle isolated neighbors. We highlight several reasons why ecologists will want to make use of autoregressive models, both directly and in hierarchical models, and not only in explicit spatial settings, but also for more general connectivity models.
- Research Article
1
- 10.6288/cjph1998-17-02-09
- Apr 1, 1998
- Canadian Journal of Public Health-revue Canadienne De Sante Publique
Hierarchical models are commonly used in analyzing geographical data. They take account of the random variation in addition to the systematic variability among observations. Through specifying a distribution for rates at different areas, various kinds of random mechanism for variability can be considered. The exchangeable (EX) priors and conditional autoregressive (CAR) priors are the two most common approaches. However, it is unclear about how to choose between these two mechanisms. In this study, motivated by looking for the true pattern of the asthma mortality data for Taipei City, we adopt the two competing EX and CAR models to investigate the spatial pattern. With the two hypotheses (the EX or CAR model), we not only need to obtain estimates of quantities of interest but also need to choose an appropriate model since the final decision may result in different etiologic studies. In this paper, we use the fully Bayesian approach with the Monte Carlo Markov Chain to obtain estimates. Then, we focus on two model selection indices-the Bayes factor and the ratio of the variances (the local effect to the global effect) for the asthma study. Based on the study results, we conclude: (1) Both the Bayes factor and the ratio of the local variance to the global variance should be used together for choosing an appropriate model. The Bayes factor offers a direct answer for which model is favored by the data, while the ratio of variances reflects the characteristic of the data and provides a way to evaluate whether it is necessary to consider the area-specific effect. (2)According to the two indices, the EX model is considered more appropriate for the asthma mortality data, and the rates at Neihu and Nankang are higher than other areas. The remaining variation among areas for the EX model may be caused by some spatial-independent variables rather than spatial-correlated variables.
- Research Article
- 10.33830/jmst.v24i1.4864.2023
- May 20, 2023
- Jurnal Matematika Sains dan Teknologi
Covid-19 cases in Indonesia occurred for the first time on 2 March 2020. By 30 September 2022, Indonesia had 158,173 Covid-19 deaths. Several studies have been done in modelling Covid-19 cases. However, research in modelling the number of Covid-19 deaths using the Bayesian Spatial Conditional Autoregressive (CAR) model is still rare. The Bayesian spatial CAR model has high flexibility in relative risk (RR) modeling. CAR models can include various types of spatial effects and can include covariates in the model. RR represents the ratio of the risk of outcome (Covid-19) in the exposed group compared to the population average (the unexposed group). This study aims to evaluate the BYM, Leroux, and Localised models with five hyperpriors, to obtain the best model for estimating the RR of Covid-19 deaths in Indonesia and to create RR maps. This study used aggregate data on Covid-19 deaths (2 March 2020 - 30 September 2022). Data on the total population and population density of each province in 2021 were also used. The best model selection is based on the lowest Watanabe Akaike Information Criterion (WAIC) and Deviance Information Criterion (DIC) values, and Modified Moran's I (MMI) residual values. The result showed that the CAR BYM model with covariates and with Inverse-Gamma IG(0.5; 0.0005) prior distribution had the lowest DIC and WAIC. As the BYM model does not converge, the model cannot be used in determining the RR of Covid-19 deaths in Indonesia. From the other three models that converge, the Bayesian CAR Leroux model without covariate with IG(0,5;0,0005) has the lowest DIC(393,76), and WAIC(400,12), and its MMI value (-0,26) is approximate to zero. Therefore, the Bayesian CAR Leroux model without covariate with IG(0,5;0,0005) is preferred. The province with the highest RR (2,76) and the lowest RR (0,22) are Yogyakarta and Papua, respectively.
- Research Article
6
- 10.1002/env.2346
- May 14, 2015
- Environmetrics
The use of conditional autoregressive (CAR) models for spatial effects is commonplace, especially when dealing with aggregated count data in health studies. CAR models are convenient and relatively easy to implement but suffer from the fact that they have limited flexibility in modeling correlation. We introduce a new CAR model that can accommodate different neighborhood features (including shared neighbors). Further, we examine via simulation how this model performs in comparison with standard CAR models. We also consider the application to a small area health data example. Copyright © 2015 John Wiley & Sons, Ltd.
- Research Article
27
- 10.1016/j.csda.2011.11.011
- Nov 22, 2011
- Computational Statistics & Data Analysis
Gaussian component mixtures and CAR models in Bayesian disease mapping
- Research Article
- 10.12691/ajams-12-3-3
- Aug 5, 2024
- American Journal of Applied Mathematics and Statistics
Establishing the patterns of a disease or disease mapping is very important in disease control and prevention. The level of accuracy that is achieved at this stage determines the effectiveness of control measures to be developed. Disease mapping has been widely done using the frequentist approach which is limited in that it does not consider prior probability distribution of a phenomenon. This limitation leads to lower levels of accuracy and validity. This study proposed a Bayesian Approach for mapping tuberculosis incidence in Meru County, Kenya. Correlational research design was utilized to determine association between TB cases and geographical locations where the cases were positively identified. Secondary data from the Meru County Health Records was used for this study. Spatial autocorrelation was performed to determine patterns of TB incidence. The study applied Conditional Autoregressive (CAR) model and Poisson Lognormal (PLN) model under the Bayesian Approach to model TB incidence in order to determine spatial temporal trends. Parameter estimation for the models was done using GIBBs Sampling under Markov Chain Monte-Carlo (MCMC). The two models (PLN and CAR) were compared using Deviance Information Criteria (DIC) to determine the one that had a better fit. Morans's I statistic was -0.3150 (p>0.05) meaning that there was no spatial autocorrelation for TB incidence in Meru County. Model results further indicated that there was no spacial dependence for TB incidence in Meru County. Deviance Information Criterion (DIC) values obtained were 0.22541 for CAR model and 0.56723 for PLN model meaning that CAR model had outperformed the PLN model. The study concluded that CAR model is more effective for disease mapping since it incorporates information from neighboring regions directly into the model to increase accuracy of estimates. Therefore, the study recommended use of Bayesian modelling for disease mapping as it incorporates prior information to stabilize the parameter estimates.
- Research Article
28
- 10.1007/s10651-012-0201-8
- Apr 29, 2012
- Environmental and Ecological Statistics
Smoothing risks is one of the main goals in disease mapping as classical measures, such as standardized mortality ratios, can be extremely variable. However, smoothing risks might hinder the detection of high risk areas, since these two objectives are somewhat contradictory. Most of the work on smoothing risks and detection of high risk areas has been derived using conditional autoregressive (CAR) models. In this work, penalized splines (P-splines) models are also investigated. Confidence intervals for the log-relative risk predictor will be derived as a tool to detect high-risk areas. The performance of P-spline and CAR models will be compared in terms of smoothing (relative bias), sensitivity (ability to detect high risk areas), and specificity (ability to discard false patterns created by noise) through a simulation study based on the well-known Scottish lip cancer data.
- Research Article
- 10.1007/s44199-025-00122-1
- Jun 23, 2025
- Journal of Statistical Theory and Applications
The modelling of property prices has been extensively studied in econometrics, with widely used approaches including generalised linear regression and geographically weighted regression. These models commonly address local spatial correlations observed in property price data. However, despite its potential to capture spatial effects, the conditional autoregressive (CAR) model remains underutilised for this purpose. This study examines the robustness and predictive power of the CAR model, comparing it with established spatial models across three different datasets generation. An illustrative case study on Lombok house price data is also included. Simulation results showed that the CAR model demonstrates a distinct advantage, achieving lower bias and variability compared to other spatial regression models, effectively capturing neighbourhood-based spatial relationships, and exhibiting strong predictive power across various scenarios. In the Lombok case study, the CAR model outperformed other models, providing more precise estimates for property-related factors such as land size and built-up area. The results confirm that CAR’s spatial framework enables a nuanced analysis of property values across regions, enhancing accuracy in predictive models. This study also reveals the distinct strengths and limitations of each model, offering insights into their predictive accuracy and applicability across diverse real estate contexts.
- Research Article
43
- 10.1016/j.csda.2008.08.010
- Aug 15, 2008
- Computational statistics & data analysis
A stochastic neighborhood conditional autoregressive model for spatial data
- Research Article
3
- 10.6000/1929-6029.2015.04.04.1
- Nov 2, 2015
- International Journal of Statistics in Medical Research
Problem: The recent 2014 Ebola virus outbreak in Western Africa is the worst in history. It is imperative that appropriate statistical and mathematical models are used to identify risk factors and to monitor the development and spread of the disease. Method: Deaths data due to Ebola virus disease (EVD) in Guinea, Liberia, and Sierra Leone from October 10, 2014 to March 24, 2015 were collected via Situation Reports published by the World Health Organization [1]. Conditional autoregressive (CAR) models were applied to account for the spatial dependency in the countries along with the temporal dimension of the disease. Bayesian change-point models were used to identify key changes in growth and drop time points in the spatial distribution of deaths due to EVD within each country. Country-specific Poisson and negative binomial mixed models of covariate effects were applied to understand the between-country variability in deaths due to EVD. Results: Both CAR models and generalized linear mixed models identified statistically significant covariate effects; however, the CAR models depended on the interval of data analyzed, whereas the mixed models depended on the underlying distribution assumed. Bayesian change-point models identified one significant change-point in the distribution of deaths due to EVD within each country. Practical Application: CAR models, Bayesian change-point models, and generalized linear mixed models demonstrate useful techniques in modeling the incidence of deaths due to EVD.
- Research Article
5
- 10.1002/ece3.3201
- Jul 18, 2017
- Ecology and Evolution
To assess the importance of variation in observer effort between and within bird atlas projects and demonstrate the use of relatively simple conditional autoregressive (CAR) models for analyzing grid‐based atlas data with varying effort. Pennsylvania and West Virginia, United States of America. We used varying proportions of randomly selected training data to assess whether variations in observer effort can be accounted for using CAR models and whether such models would still be useful for atlases with incomplete data. We then evaluated whether the application of these models influenced our assessment of distribution change between two atlas projects separated by twenty years (Pennsylvania), and tested our modeling methodology on a state bird atlas with incomplete coverage (West Virginia). Conditional Autoregressive models which included observer effort and landscape covariates were able to make robust predictions of species distributions in cases of sparse data coverage. Further, we found that CAR models without landscape covariates performed favorably. These models also account for variation in observer effort between atlas projects and can have a profound effect on the overall assessment of distribution change. Accounting for variation in observer effort in atlas projects is critically important. CAR models provide a useful modeling framework for accounting for variation in observer effort in bird atlas data because they are relatively simple to apply, and quick to run.
- Research Article
2
- 10.4081/gh.2024.1321
- Oct 3, 2024
- Geospatial health
Stunting continues to be a significant health issue, particularly in developing nations, with Indonesia ranking third in prevalence in Southeast Asia. This research examined the risk of stunting and influencing factors in Indonesia by implementing various Bayesian spatial conditional autoregressive (CAR) models that include covariates. A total of 750 models were run, including five different Bayesian spatial CAR models (Besag-York-Mollie (BYM), CAR Leroux and three forms of localised CAR), with 30 covariate combinations and five different hyperprior combinations for each model. The Poisson distribution was employed to model the counts of stunting cases. After a comprehensive evaluation of all model selection criteria utilized, the Bayesian localised CAR model with three covariates were preferred, either allowing up to 2 clusters with a variance hyperprior of inverse-gamma (1, 0.1) or allowing 3 clusters with a variance hyperprior of inverse-gamma (1, 0.01). Poverty and recent low birth weight (LBW) births are significantly associated with an increased risk of stunting, whereas child diet diversity is inversely related to the risk of stunting. Model results indicated that Sulawesi Barat Province has the highest risk of stunting, with DKI Jakarta Province the lowest. These areas with high stunting require interventions to reduce poverty, LBW births and increase child diet diversity.
- Research Article
8
- 10.1007/s11284-010-0732-0
- May 26, 2010
- Ecological Research
We tested the effectiveness of distribution‐prediction models for four rare herbaceous wetland species in the Watarase wetland, Japan, based on data obtained from aerial images. We used visible and near‐infrared aerial images from three seasons, and elevations and vegetation heights derived from the images. Because spatial autocorrelation in species distribution data often biases the estimated effects of certain variables and reduces the prediction accuracy of distribution models, we compared the predictions of an intrinsic conditional autoregressive (CAR) model, which accounts for spatial autocorrelation, with those of a standard logistic regression model. The four study species had different distribution patterns: Ophioglossum namegatae and Impatiens ohwadae had aggregated distributions, whereas Galium tokyoense and Thalictrum simplex var. brevipes had scattered distributions. Predictions based on remote sensing images performed well for O. namegatae with the intrinsic CAR model and for I. ohwadae with both the logistic and CAR models; performance was poor for G. tokyoense and T. simplex var. brevipes with both models. Prediction accuracy improved by the CAR model in comparison to the logistic model most in O. namegatae and least in I. ohwadae. Impatiens ohwadae's distribution was explained well by ground height. In contrast, the apparent improvement in the prediction for O. namegatae resulted from a substantial spatial random effect, suggesting the presence of determinants that could not be detected by remote sensing. The number of explanatory variables with large effects decreased in the intrinsic CAR model in three species possibly by avoiding spatial pseudoreplication, but not for T. simplex var. brevipes.