Abstract

The paper addresses the task of analyzing traffic safety in the Northwestern Federal District according to the reviews published in the Web. To accomplish the task, the authors developed a system of automatic review classification based on a sentiment classifier. They analyzed open source libraries for data mining, developed a web crawler using Scrapy framework, written in Python 3, and collected reviews. They also considered the methods of text vectorization and lemmatization and their application in the Scikit-Learn library: Bag-of-Words, N-gram, CountVectorizer, and TF-IDF Vectorizer. For the purpose of classification, the authors used the naïve Bayes algorithm and a linear classifier model with stochastic gradient descent optimization. A base of tagged Twitter reviews was used as a training set. The classifier was trained using cross-validation and ShuffleSplit strategies. The authors also tested and compared the classification results for different classifiers. As a result of validation, the best model was determined. The developed system was applied to analyze the quality of roads in the Northwestern Federal District. Based on the outcome, the roads were marked-up in color to illustrate the results of the research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.