Bayesian approach for sample size determination, illustrated with Soil Health Card data of Andhra Pradesh (India)

D.J Brus,B Kempen,D Rossiter,Balwinder-Singh Balwinder-Singh,A.J Mcdonald

doi:10.1016/j.geoderma.2021.115396

Abstract

A crucial decision in designing a spatial sample for soil survey is the number of sampling locations required to answer, with sufficient accuracy and precision, the questions posed by decision makers at different levels of geographic aggregation. In the Indian Soil Health Card (SHC) scheme, many thousands of locations are sampled per district. In this paper the SHC data are used to estimate the mean of a soil property within a defined study area, e.g., a district, or the areal fraction of the study area where some condition is satisfied, e.g., exceedence of a critical level. The central question is whether this large sample size is needed for this aim. The sample size required for a given maximum length of a confidence interval can be computed with formulas from classical sampling theory, using a prior estimate of the variance of the property of interest within the study area. Similarly, for the areal fraction a prior estimate of this fraction is required. In practice we are uncertain about these prior estimates, and our uncertainty is not accounted for in classical sample size determination (SSD). This deficiency can be overcome with a Bayesian approach, in which the prior estimate of the variance or areal fraction is replaced by a prior distribution. Once new data from the sample are available, this prior distribution is updated to a posterior distribution using Bayes’ rule. The apparent problem with a Bayesian approach prior to a sampling campaign is that the data are not yet available. This dilemma can be solved by computing, for a given sample size, the predictive distribution of the data, given a prior distribution on the population and design parameter. Thus we do not have a single vector with data values, but a finite or infinite set of possible data vectors. As a consequence, we have as many posterior distribution functions as we have data vectors. This leads to a probability distribution of lengths or coverages of Bayesian credible intervals, from which various criteria for SSD can be derived. Besides the fully Bayesian approach, a mixed Bayesian-likelihood approach for SSD is available. This is of interest when, after the data have been collected, we prefer to estimate the mean from these data only, using the frequentist approach, ignoring the prior distribution. The fully Bayesian and mixed Bayesian-likelihood approach are illustrated for estimating the mean of log-transformed Zn and the areal fraction with Zn-deficiency, defined as Zn concentration <0.9 mg kg −1, in the thirteen districts of Andhra Pradesh state. The SHC data from 2015–2017 are used to derive prior distributions. For all districts the Bayesian and mixed Bayesian-likelihood sample sizes are much smaller than the current sample sizes. The hyperparameters of the prior distributions have a strong effect on the sample sizes. We discuss methods to deal with this. Even at the mandal (sub-district) level the sample size can almost always be reduced substantially. Clearly SHC over-sampled, and here we show how to reduce the effort while still providing information required for decision-making. R scripts for SSD are provided as supplementary material.

Highlights

This research was motivated by the desire to evaluate the sampling efficiency of the nationally-mandated Soil Health Card (SHC) Scheme in India
For all districts the effect of n0 on the mixed Bayesian-likelihood sample sizes is strong for small prior sample sizes, but levels off
Though we acknowledge that the SHC scheme is oriented towards field management, we have shown that for district-level assessments all sample sizes are substantially smaller than the current sample sizes applied in the SHC scheme (Table 1)

Summary

Introduction

This research was motivated by the desire to evaluate the sampling efficiency of the nationally-mandated Soil Health Card (SHC) Scheme in India. This scheme specifies soil sampling at a very high density every two years. Cycle 2 (2017/18–2018/19) recorded 2,393,8875 observations, a den sity of 14.7 km− 1, or one per 6.8 ha. This is consistent with the SHC policy of one soil sample per 10 ha in rainfed and one per 2.5 ha in irrigated areas. Sampling locations are not necessarily revisited in subsequent sampling rounds

Objectives

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Geoderma	Publication Date: Sep 9, 2021
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Bayesian approach for sample size determination, illustrated with Soil Health Card data of Andhra Pradesh (India)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Geoderma

Lead the way for us

Similar Papers

A comparison between structural equation modelling (SEM) and Bayesian SEM approaches on in-store behaviour
Fon Sim Ong ... Osman Syuhaily
Industrial Management & Data Systems | VOL. 118
Fon Sim Ong, et. al.Fon Sim Ong ... Osman Syuhaily
29 Dec 2017
Industrial Management & Data Systems | VOL. 118

Superoxygen Therapy
Hung Q Ly ... Jean-Francois Tanguay
Circulation: Cardiovascular Interventions | VOL. 2
Hung Q Ly, et. al.Hung Q Ly ... Jean-Francois Tanguay
01 Oct 2009
Circulation: Cardiovascular Interventions | VOL. 2

Bayesian Approach for Confidence Intervals of Variance on the Normal Distribution
Autcha Araveeporn
International Journal of Mathematics and Mathematical Sciences | VOL. 2022
Autcha AraveepornAutcha Araveeporn
27 Aug 2022
International Journal of Mathematics and Mathematical Sciences | VOL. 2022

Improving efficiency in the stepped-wedge trial design via Bayesian modeling with an informative prior for the time effects.
Denghuang Zhan ... Hubert Wong
Clinical Trials | VOL. 18
Denghuang Zhan, et. al.Denghuang Zhan ... Hubert Wong
05 Apr 2021
Clinical Trials | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bayesian approach for sample size determination, illustrated with Soil Health Card data of Andhra Pradesh (India)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Geoderma