The physical aspects of commodity trade are becoming increasingly important on a global scale for transportation planning, demand management for transportation facilities and services, energy use, and environmental concerns. Such aspects (for example, weight and volume) of commodities are vital for logistics industry to allow for medium-to-long term planning at the strategic level and identify commodity flow trends. However, incomplete physical commodity trade databases impede proper analysis of trade flow between various countries. The missing physical values could be due to many reasons such as, (1) non-compliance of reporter countries with the prescribed regulations by World Customs Organization (WCO) (2) confidentiality issues, (3) delays in processing of data, or (4) erroneous reporting. The traditional missing data imputation methods, such as the substitution by mean, substitution by linear interpolation/extrapolation using adjacent points, the substitution by regression, and the substitution by stochastic regression, have been proposed in the context of estimating physical aspects of commodity trade data. However, a major demerit of these single imputation methods is their failure to incorporate uncertainty associated with missing data. The use of computationally complex stochastic methods to improve the accuracy of imputed data has recently become possible with the advancement of computer technology. Therefore, this study proposes a sophisticated data augmentation algorithm in order to impute missing physical commodity trade data. The key advantage of the proposed approach lies in the fact that instead of using a point estimate as the imputed value, it simulates a distribution of missing data through multiple imputations to reflect uncertainty and to maintain variability in the data. This approach also provides the flexibility to include fundamental distributional property of the variables, such as physical quantity, monetary value, price elasticity of demand, price variation, and product differentiation, and their correlations to generate reasonable average estimates of statistical inferences. An overview and limitations of most commonly used data imputation approaches is presented, followed by the theoretical basis and imputation procedure of the proposed approach. Lastly, a case study is presented to demonstrate the merits of the proposed approach in comparison to traditional imputation methods.
Read full abstract