Sentiment Analysis of "AUTOSTRADA.INFO/RU" Users’ Comments

Aliaksandra Svistunova,Arseniy Sazanov,Svyatoslav Seliverstov,Viktoriya Chigur,Yaroslav Seliverstov

doi:10.15622/sp.18.2.354-389

Abstract

As a result of the analysis, it was revealed that social networks (Vkontakte, Facebook), thematic communities in microblogging networks (Twitter), resources for travelers (TripAdvisor), transport portals (Autostrada) are a source of up-to-date and operational information about the traffic situation, the quality of transport services and passenger satisfaction with the quality of levels of transport services. However, the existing transport monitoring systems do not contain software tools capable of collecting and analyzing traffic information located in the Internet environment. This paper discusses the task of building a system for automatically retrieving and classifying road traffic information from transport Internet portals and testing the developed system for analyzing the transport networks of Crimea and the city of Sevastopol. To solve this problem, an analysis of open source libraries for thematic data collection and analysis was carried out. An algorithm for extracting and analyzing texts has been developed. A crawler was developed using the Scrapy package in Python3, and user feedback from the portal http://autostrada.info/ru was collected on the state of the transport system of Crimea and the city of Sevastopol. For texts lemmatization and vector text transformation, the tf, idf, tf-idf methods and their implementation in the Scikit-Learn library were considered: CountVectorizer and TF-IDF Vectorizer. For word processing, Bag-of-Words and n-gram methods were considered. During the development of the classifier model, the naive Bayes algorithm (MultinomialNB) and the linear classifier model with optimization of the stochastic gradient descent (SGDClassifier) were used. As a training sample, a corpus of 225,000 labeled texts from the Twitter resource was used. The classifier was trained, during which the cross-validation strategy and the ShuffleSplit method were used. Testing and comparison of the results of the pitch classification were carried out. According to the results of validation, the linear model with the n-gram scheme [1, 3] and the vectorizer TF-IDF turned out to be the best. During the approbation of the developed system, the collection and analysis of reviews related to the quality of transport networks of the Republic of Crimea and the city of Sevastopol were conducted. Conclusions are drawn and prospects for further functional development of the developed tools are defined.

Highlights

As a result of the analysis, it was revealed that social networks (Vkontakte, Facebook), thematic communities in microblogging networks (Twitter), resources for travelers (TripAdvisor), transport portals (Autostrada) are a source of up-to-date and operational information about the traffic situation, the quality of transport services and passenger satisfaction with the quality of levels of transport services
The existing transport monitoring systems do not contain software tools capable of collecting and analyzing traffic information located in the Internet environment
This paper discusses the task of building a system for automatically retrieving and classifying road traffic information from transport Internet portals and testing the developed system for analyzing the transport networks of Crimea and the city of Sevastopol

Summary

Модель тонового классификатора

Модель Bag of Words [33] позволяет перейти к компактному представлению документа, в котором любое слово wt V словаря V в документе di имеет количество вхождений равное nt , следовательно, любой документ di может быть представлен вектором в виде [32]:. Алгоритм построения модели следующий: 1) cоставляется словарь терминов из всех слов, встречающихся в тексте, при этом из текста предварительно исключаются все знаки препинания, числа и «стоп-слова»; 2) для каждого документа определяется вектор, каждая компонента которого соответствует термину из словаря, а ее значение определяется числом, характеризующим сколько раз это слово встретилось в тексте. Для построения модели тонового классификатора рассмотрим и сравним две наиболее используемые модели классификации: наивный байесовский классификатор и линейный классификатор на основе стахостического градиента. Словарь; тогда документ di — это вектор длины V , состоящий из битов Bit ; Bit 1 , если слово wt встречается в документе di.

Bit p wt c j

Xl xi

Окружная Севастополя

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Труды СПИИРАН	Publication Date: Apr 12, 2019
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sentiment Analysis of "AUTOSTRADA.INFO/RU" Users’ Comments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Труды СПИИРАН

Lead the way for us

Similar Papers

Traffic safety evaluation in Northwestern Federal District using sentiment analysis of Internet users’ reviews
Yaroslav Seliverstov ... Oleg Korolev
Transportation Research Procedia | VOL. 50
Yaroslav Seliverstov, et. al.Yaroslav Seliverstov ... Oleg Korolev
01 Jan 2020
Transportation Research Procedia | VOL. 50

Fake News Detection using Naive Bayes
Nurshaheeda Shazleen Yuslee ... Nur Atiqah Sia Abdullah
-
Nurshaheeda Shazleen Yuslee, et. al.Nurshaheeda Shazleen Yuslee ... Nur Atiqah Sia Abdullah
06 Nov 2021
06 Nov 2021

UBIS: Unigram Bigram Importance Score for Feature Selection from Short Text
Muskan Garg
Expert Systems with Applications | VOL. 195
Muskan GargMuskan Garg
07 Feb 2022
Expert Systems with Applications | VOL. 195

Robust Grape Cluster Detection in a Vineyard by Combining the AdaBoost Framework and Multiple Color Components.
Lufeng Luo ... Po Zhang
Sensors | VOL. 16
Lufeng Luo, et. al.Lufeng Luo ... Po Zhang
10 Dec 2016
Sensors | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sentiment Analysis of "AUTOSTRADA.INFO/RU" Users’ Comments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Труды СПИИРАН