Abstract

Data missing problems often occur on the Internet-of-Things domains. This article proposes a missing type-aware interpolation framework (IMA) for data loss problems in city-wide environmental monitoring systems that contain many scattered stations. To interpolate data as accurately as possible, IMA considers three aspects of information, i.e., spatiotemporal, all attributes of one measurement, and all values and accordingly develop three methods to estimate the missing data. First, we develop an improved multiviewer method, which uses the spatiotemporal correlation of data from neighbor stations to estimate random missing values. Second, we propose a new multi-eXtreme Gradient Boosting (multi-XGBoost) method that uses the values of the co-occurring and correlated correct attributes to predict the value of the missing attribute. Third, we take advantage of matrix factorization to estimate the missing parts if the data of the interpolation matrix are not all missing. To avoid the influence of uncorrelated data, IMA calculates Pearson's correlation coefficient between data of each station and uses those data from its top k highest correlation neighbors to form an interpolation matrix. Furthermore, due to the complexity of missing cases, IMA uses confidence levels in each of the three data prediction methods. For example, if the multiviewer method fails, IMA weights all valid results with confidence levels. We conduct our experiments on two real-world datasets from air quality monitoring stations in Beijing. Both datasets contain numerous missing measurements. Experimental results show that IMA outperforms other counterpart methods in interpolating the missing measurements, in terms of accuracy and effectiveness. Compared with the most related method, IMA improves the interpolation accuracy from 0.818 to 0.849 in a small dataset and from 0.214 to 0.759 in a large one.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.