Abstract

Missing data reconstruction is a critical step in the analysis and mining of spatio-temporal data; however, few studies comprehensively consider missing data patterns, sample selection and spatio-temporal relationships. As a result, traditional methods often fail to obtain satisfactory accuracy or address high levels of complexity. To combat these problems, this study developed an effective two-step method for spatio-temporal missing data reconstruction (ST-2SMR). This approach includes a coarse-grained interpolation method for considering missing patterns, which can successfully eliminate the influence of continuous missing data on the overall results. Based on the results of coarse-grained interpolation, a dynamic sliding window selection algorithm was implemented to determine the most relevant sample data for fine-grained interpolation, considering both spatial and temporal heterogeneity. Finally, spatio-temporal interpolation results were integrated by using a neural network model. We validated our approach using Beijing air quality data and found that the proposed method outperforms existing solutions in term of estimation accuracy and reconstruction rate.

Highlights

  • Following both the rapid development and popularization of geographic information and the enhancement of data collection, data with temporal and spatial attributes are quickly accumulated and form large numbers of spatio-temporal datasets [1]; missing data are extremely common; for example, missing data on air quality monitoring sensor readings, missing data on floating car track points or the absence of mobile phone signaling records

  • A large number of interpolation methods has been proposed to solve the problem of spatio-temporal missing data [4,5,6,7,8,9,10]. These methods can be roughly divided into three categories: spatial interpolation, temporal interpolation and spatio-temporal interpolation

  • Traditional methods (e.g., inverse distance weighting (IDW)) assume that the data distribution obeys the first law of geography, namely the closer data are in spatial distribution, the greater the contribution they make to missing data interpolation

Read more

Summary

Introduction

Following both the rapid development and popularization of geographic information and the enhancement of data collection, data with temporal and spatial attributes are quickly accumulated and form large numbers of spatio-temporal datasets [1]; missing data are extremely common; for example, missing data on air quality monitoring sensor readings, missing data on floating car track points or the absence of mobile phone signaling records. Due to the existence of spatial and temporal heterogeneity, the data distribution can show uneven characteristics and relationships according to different regions [15]; the accuracy of interpolation results obtained by existing methods remains unsatisfactory if data are not homogeneously distributed To solve this problem [16] considered spatial autocorrelation and heterogeneity in a study area and proposed a point estimation model of biased hospital-based area disease estimation (P-BSHADE). Using the correlation coefficient to determine the spatial and temporal weights, estimated values in spatial and temporal dimensions are integrated to obtain overall estimated values of missing data [2] This method requires the whole dataset to participate in computation, which leads to high computational complexity and a large volume of redundant data.

Method Framework
Coarse-Grained Interpolation
Sliding Window
Fine-Grained Spatial Dimension Interpolation
Fine-Grained Temporal Dimension Interpolation
Spatio-Temporal Integration
Datasets
Evaluation Criteria
Experimental Results
Overall Results
Effect of Coarse-Grained Interpolation
Effect of the Coarse-Grained Missing Data Rate
EEffect of Sliding Window
Performance Comparison for Different Datasets
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call