Abstract
Small data samples are still a critical challenge for spatial predictions. Land use regression (LUR) is a widely used model for spatial predictions with observations at a limited number of locations. Studies have demonstrated that LUR models can overcome the limitation exhibited by other spatial prediction models which usually require greater spatial densities of observations. However, the prediction accuracy and robustness of LUR models still need to be improved due to the linear regression within the LUR model. To improve LUR models, this study develops a land use quantile regression (LUQR) model for more accurate spatial predictions for small data samples. The LUQR is an integration of the LUR and quantile regression, which both have advantages in predictions with a small data set of samples. In this study, the LUQR model is applied in predicting spatial distributions of annual mean PM2.5concentrations across the Greater Sydney Region, New South Wales, Australia, with observations at 19 valid monitoring stations in 2020. Cross validation shows that the goodness-of-fit can be improved by 25.6–32.1% by LUQR models when compared with LUR, and prediction root mean squared error (RMSE) and mean absolute error (MAE) can be reduced by 10.6–13.4% and 19.4–24.7% by LUQR models, respectively. This study also indicates that LUQR is a more robust model for the spatial prediction with small data samples than LUR. Thus, LUQR has great potentials to be widely applied in spatial issues with a limited number of observations.
Highlights
Small data samples have been a critical challenge for the prediction of geographical attributes [1]
This study developed a land use quantile regression (LUQR) model for more accurate spatial predictions of air pollution
The LUQR model is an integration of the land use regression (LUR) and the quantile regression models, which both have advantages in robust modeling with a small number of observations
Summary
Small data samples have been a critical challenge for the prediction of geographical attributes [1]. A certain number of samples or observations are required in models for spatial prediction. In certain cases, it is difficult to collect enough samples for spatial predictions. The cases of small data samples are usually due to several factors. Historical data usually contain a limited number of samples, such as meteorological observations in the previous century [6]. It is difficult to collect massive or enough samples for specific and uncommon attributes or for some regions. The distribution of global in situ monitoring stations of soil moisture is critically unbalanced [7,8]. Air pollution monitoring stations are usually limited within a city, which leads to difficulty in regional spatial prediction of air pollution [11,12]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.