Abstract
Missing data reconstruction is a critical step in the analysis and mining of spatio-temporal data; however, few studies comprehensively consider missing data patterns, sample selection and spatio-temporal relationships. As a result, traditional methods often fail to obtain satisfactory accuracy or address high levels of complexity. To combat these problems, this study developed an effective two-step method for spatio-temporal missing data reconstruction (ST-2SMR). This approach includes a coarse-grained interpolation method for considering missing patterns, which can successfully eliminate the influence of continuous missing data on the overall results. Based on the results of coarse-grained interpolation, a dynamic sliding window selection algorithm was implemented to determine the most relevant sample data for fine-grained interpolation, considering both spatial and temporal heterogeneity. Finally, spatio-temporal interpolation results were integrated by using a neural network model. We validated our approach using Beijing air quality data and found that the proposed method outperforms existing solutions in term of estimation accuracy and reconstruction rate.
Highlights
Following both the rapid development and popularization of geographic information and the enhancement of data collection, data with temporal and spatial attributes are quickly accumulated and form large numbers of spatio-temporal datasets [1]; missing data are extremely common; for example, missing data on air quality monitoring sensor readings, missing data on floating car track points or the absence of mobile phone signaling records
A large number of interpolation methods has been proposed to solve the problem of spatio-temporal missing data [4,5,6,7,8,9,10]. These methods can be roughly divided into three categories: spatial interpolation, temporal interpolation and spatio-temporal interpolation
Traditional methods (e.g., inverse distance weighting (IDW)) assume that the data distribution obeys the first law of geography, namely the closer data are in spatial distribution, the greater the contribution they make to missing data interpolation
Summary
Following both the rapid development and popularization of geographic information and the enhancement of data collection, data with temporal and spatial attributes are quickly accumulated and form large numbers of spatio-temporal datasets [1]; missing data are extremely common; for example, missing data on air quality monitoring sensor readings, missing data on floating car track points or the absence of mobile phone signaling records. Due to the existence of spatial and temporal heterogeneity, the data distribution can show uneven characteristics and relationships according to different regions [15]; the accuracy of interpolation results obtained by existing methods remains unsatisfactory if data are not homogeneously distributed To solve this problem [16] considered spatial autocorrelation and heterogeneity in a study area and proposed a point estimation model of biased hospital-based area disease estimation (P-BSHADE). Using the correlation coefficient to determine the spatial and temporal weights, estimated values in spatial and temporal dimensions are integrated to obtain overall estimated values of missing data [2] This method requires the whole dataset to participate in computation, which leads to high computational complexity and a large volume of redundant data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have