Abstract

Mobile device location data (MDLD) have been popularly utilized in various fields. Yet large-scale applications are limited because of either biased or insufficient spatial coverage of the data from individual data vendors. One approach to improve the data coverage is to leverage the data from different data vendors and integrate them to build a more representative dataset. To extract reliable statistics from MDLD, certain data preprocessing steps are crucial to ensure the accuracy of the analysis. One of these steps is the development of a framework to remove duplicated devices or several devices that belong to the same data subject. This treatment is especially necessary when using a multiplicity of data sources, as the same device may be captured by more than one data provider. We propose a data integration methodology for multisourced data to investigate the feasibility of integrating data from several sources. By leveraging the uniqueness of travel pattern of each device, duplicate devices are identified. The proposed methodology is shown to be cost-effective through a national-level analysis. The method is successfully applied to a dataset from January 2020 consisting of more than 270 million raw devices nationwide. Our findings suggest that devices sharing the same imputed home location and the same top-five most-visited locations during a month can represent the same user in the MDLD. It is shown that more than 99.6% of the sample devices having the aforementioned attribute in common are observed at the same location simultaneously.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call