Abstract
On the Dutch road network, about 60,000 road sensors are located of which 20,000 sensors are on the Dutch highways. Both vehicle counts and average speed are collected each minute and stored in the National Traffic Daffic statistics several methodological challenges needed to be solved. The first was developing a method to check and improve the data quality as quite some sensors lacked data for many minutes during the day. A cleaning and estimation step was implemented that enabled a precise and accurate estimate of the number of vehicles actually passing the sensors for each minute. The second challenge was monitoring the stream of incoming and outgoing data and controlling this fully automatic statistical process. This required defining quality indicators on the raw and processed sensor data. The fourth challenge was determining calibration weights based on the geographic locations of the road sensors on the roads. This was needed because road sensors are not uniformly distributed over the road network. As the number of active sensors fluctuates over time, the weights need to be determined periodically. As a result of these steps accurate numbers could be produced on the traffic intensity during various periods on regions in the Netherlands.
Highlights
Big data is a very interesting data source for official statistics
Its use brings a lot of challenges on how to create statistics based on such data sources [1]
The core statistical process that will be considered in this paper is the cleaning process of road sensor data
Summary
Big data is a very interesting data source for official statistics. its use brings a lot of challenges on how to create statistics based on such data sources [1]. In some cases the amount is so large that even checking a small fraction of the data is a huge task In such cases, we can only check the quality and clean big data using a fully automated process. In the Netherlands, minute based vehicle counts are gathered at 24,000 sites by approximately 60,000 road sensors. Since vehicles pass sensors at different speeds and the sampling frequency is limited to ’only’ one sample per minute, one does not find a large correlation between the data of two sensors; even if they are 250 meters apart This makes it hard to clean the data purely based on comparing the findings of close-by sensors.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.