Abstract

The data generated by social media such as Twitter are classified as big data and the usability of those data can provide a wide range of resources to various study areas including disaster management, tourism, political science, and health. However, apart from the acquisition of the data, the reliability and accuracy when it comes to using it concern scientists in terms of whether or not the use of social media data (SMD) can lead to incorrect and unreliable inferences. There have been many studies on the analyses of SMD in order to investigate their reliability, accuracy, or credibility, but that have not dealt with the filtering techniques applied to with the data before creating the results or after their acquisition. This study provides a methodology for detecting the accuracy and reliability of the filtering techniques for SMD and then a spatial similarity index that analyzes spatial intersections, proximity, and size, and compares them. Finally, we offer a comparison that shows the best combination of filtering techniques and similarity indices to create event maps of SMD by using the Getis-Ord Gi* technique. The steps of this study can be summarized as follows: an investigation of domain-based text filtering techniques for dealing with sentiment lexicons, machine learning-based sentiment analyses on reliability, and developing intermediate codes specific to domain-based studies; then, by using various similarity indices, the determination of the spatial reliability and accuracy of maps of the filtered social media data. The study offers the best combination of filtering, mapping, and spatial accuracy investigation methods for social media data, especially in the case of emergencies, where urgent spatial information is required. As a result, a new similarity index based on the spatial intersection, spatial size, and proximity relationships is introduced to determine the spatial accuracy of the fine-filtered SMD. The motivation for this research is to develop the ability to create an incidence map shortly after a disaster event such as a bombing. However, the proposed methodology can also be used for various domains such as concerts, elections, natural disasters, marketing, etc.

Highlights

  • This study focuses on finding an assessing methodology to quantify the impacts of filtering techniques on the spatial reliability of the social media data (SMD) to acquire the most suitable, reliable, and accurate SMD based on the subject for using it in various approaches

  • The social media data generated by billions of human sensors throughout the world and by nearly half of the total population of Turkey are crucially significant as a data source, during and after a disaster

  • The study focused on two main investigations; the use of common approaches for domain-based filtering SMD in the Turkish Language, and the spatial reliability of the incidence maps that are produced with the domain-based filtered SMD

Read more

Summary

Introduction

This study focuses on finding an assessing methodology to quantify the impacts of filtering techniques on the spatial reliability of the social media data (SMD) to acquire the most suitable, reliable, and accurate SMD based on the subject for using it in various approaches. As Castillo Ocaranza, Mendoza, and Poblete Labra [21] indicate in their study, the filtering and ensuring the credibility of tweets in Spanish requires manual labeling, due to the possibility of non-relevant classifications Those type of situation with regard to filtering and language affect the accuracy and reliability of the resulting maps (event, hazard, or risk maps). In this part, two different sentiment lexicons for the Turkish language and three different machine-learning techniques are presented for automatically filtering relevant content. While this tidying step was utilized to regulate the text content for further processing, data still might have included numerous irregularities such as spelling errors, jargon, and slang that could decrease the performance of further processing

Data Exploration
Commonality Cloud
Comparison Cloud
Pyramid Plot
Word Dendrogram
Data Processing
Interpretation of Spatial SMD
Spatial Clustering
Spatial Similarity Calculation
Data Tidying
Spatial Interpretation over Fine-Filtered SMD
Outcomes
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.