Abstract

Mapping websites and geo portals are playing a vital role in daily life due to the availability of geo-tagged data. From booking a cab to search a place, getting traffic information, review of the place, searching for a doctor or best school available in the locality, we are heavily dependent on the map services and geo portals available for finding such information. There is voluminous data available on these sources and it is getting increasing every moment. These data are majorly collected through crowdsourcing methods where people are contributing. As a basic principle of Garbage in garbage out, the quality of this data impacts the quality of the services based on this data. Therefore, it is highly desired to have a model which can predict the quality/accuracy of the geotagged Point of interest data. We propose a novel Fine-Tuned Predictive Model to check the accuracy of this data using the best suitable supervised machine learning approach. This work focuses on the complete life cycle of the model building, starting from the data collection to the fine-tuning of the hyperparameters. We covered the challenges particularly to the geotagged POI data and remedies to resolve the issues to make it suitable for predictive modeling for classifying the data based on their accuracy. This is a unique work that considers multiple sources including ground truth data to verify the geotagged data using a machine learning approach. After exhaustive experiments, we obtained the best values for hyperparameters for the selected predictive model built on the real data set prepared specifically to target the proposed solution. This work provides a way to develop a robust pipeline for predicting the accuracy of crowdsourced geotagged data.

Highlights

  • We have been witnessing the data generation era where each day voluminous data is getting generated by people on different platforms like social media websites, microblogging websites, geo portals, web mapping websites, etc

  • POI data are known as geotagged data which includes the geographical information of a place along with the metadata

  • Further we propose the fine-tuned predictive model to predict the appropriate class of the data based on its accuracy using state-of-the-art methods and techniques

Read more

Summary

INTRODUCTION

We have been witnessing the data generation era where each day voluminous data is getting generated by people on different platforms like social media websites, microblogging websites, geo portals, web mapping websites, etc. Among these platforms, mapping websites and geo portals provide a wide variety of map data which has several important applications like traffic conditions, finding the route, business listing, etc These maps contain a wide variety of Point of interest data such as public services like healthcare, schools, hotels, monuments, religious places, courts, open areas, business points, etc. It is important to measure the quality of geotagged data These data are voluminous there should be an automatic method or model to check the accuracy. Further we propose the fine-tuned predictive model to predict the appropriate class of the data based on its accuracy using state-of-the-art methods and techniques. Fine-tuning of hyperparameters is explained in Section VII and at last, we conclude the work

RELATED WORK
Imbalance Class Dataset
Predictive Modelling
Review Summary
PROBLEM SETTING
Definitions
Tagging the Label
Problem Definition
DATA PREPARATION
MODEL BUILDING
Data Analysis
Standardization and Class Balancing
Model Selection Process
EXPERIMENTAL SET UP
Experimental Results
FINE TUNING
Solver
Regularization
C-Value
VIII CONCLUSION
Name Similarity
Address Similarity
PIN Similarity
Distance Variation
Category Similarity
UserType Coding
WebSrc Count
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.