Abstract

This paper addresses the problem of balancing statistical economic data, when data structure is arbitrary and both uncertainty estimates and a ranking of data quality are available. Using a Bayesian approach, the prior configuration is described as a multivariate random vector and the balanced posterior is obtained by application of relative entropy minimization. The paper shows that conventional data balancing methods, such as generalized least squares, weighted least squares and biproportional methods are particular cases of the general method described here. As a consequence, it is possible to determine the underlying assumptions and range of application of each traditional method. In particular, the popular biproportional method is found to assume that all source data has the same relative uncertainty. Finally, this paper proposes a simple linear iterative method that generalizes the biproportional method to the data balancing problem with arbitrary data structure, uncertainty estimates and multiple data quality levels.

Highlights

  • In the compilation of statistical economic data, such as a census-based Input-Output (IO) table or a social-accounting matrix (SAM), it is often the case that the data is not balanced, i.e., row and column sums do not add up [1]

  • This paper addresses the problem of balancing an IO table with arbitrary structure, uncertainty estimates and multiple data sources

  • This subsection introduces the concept of data quality, which determines the sequence in which the data balancing procedure is implemented and how numerical constraints are constructed

Read more

Summary

Introduction

In the compilation of statistical economic data, such as a census-based Input-Output (IO) table or a social-accounting matrix (SAM), it is often the case that the data is not balanced, i.e., row and column sums do not add up [1]. There is no data balancing method that addresses all of these issues, even though all of them arise in the compilation of multi-regional IO models In this paper, this problem is solved using concepts and techniques of Bayesian inference [6]. The analytical solution is impractical, so a series of numerical approximations is derived, whose validity depends on the amount of uncertainty information initially available After this derivation conventional data balancing methods are reviewed and a one-to-one correspondence between the conventional methods and the numerical approximations is identified. The Bayesian linear algorithm (recommended for most practical applications) turns out to be a generalization of the classical RAS method to the situation of arbitrary structure, uncertainty information and data quality hierarchy.

Problem Formulation
Analytical Solution
Data Quality
Numerical Approximations
The GLS Algorithm
The WLS Algorithm
The Proportional Algorithm
The Linear Algorithm
Proportional and Cross-Entropy Methods
Least-Squares Methods
Discussion
Empirical Considerations
Conclusions
Treatment of Zero and Negative Entries
Stochastic First and Second Moment Constraints
GLS Algorithm
Maximally Uninformative Aggregate Data
A Single Accounting Identity
Derivation of the WLS Algorithm

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.