Efficient Sentiment-Aware Web Crawling Methods for Constructing Sentiment Dictionary

Jangwon Gim,Gyu Sang Choi,Byung-Won On,Jun-Young Jo,Soo-Mok Jung,Hyunkwang Shin

doi:10.1109/access.2021.3129187

Abstract

In traditional web crawling, all web pages crawled are first stored to databases. As a result, this approach can store unnecessary web pages and requires additional running time for the construction of a sentiment dictionary in a particular domain because sentiment words should be identified by scanning all web pages in the database. To address these problems, we first define the sentiment-aware web crawling problem and then propose two hash-based methods for the implementation. One is based on hash join and the other is bucket-sorted hash join. In particular, we propose a novel bucket-sorted hash join for the efficient sentiment-aware web crawling method. Our experimental results show that the proposed web crawling method using bucket-sorted hash join outperforms existing web crawling methods by significantly reducing the running time and storage space. In the proposed method, the time taken to execute the sentiment-aware task per web page is 0.016 seconds and the database space can be saved by 59% compared to the existing web crawling methods.

Highlights

In the past, most data mining techniques that exploit useful information hidden in objective facts have been widely used, but recent studies on analyzing and aggregating subjective information of people by the development of smart devices and social network services have been treated to be important
Public opinion and market research are no longer surveyed in the traditional way, but rather relevant data are automatically collected from the web and pros and cons of the questionnaire are summarized through sentiment analysis
In this work, we propose a new sentiment–aware web crawling approach that filters unnecessary web pages during web crawling

Summary

INTRODUCTION

Most data mining techniques that exploit useful information hidden in objective facts have been widely used, but recent studies on analyzing and aggregating subjective information of people by the development of smart devices and social network services have been treated to be important. The same process is repeated until the queue is empty In this manner, traditional web crawling methods are likely to store the downloaded web pages in a file system in which all web pages are scanned when a sentiment dictionary for a particular domain is constructed. For efficient sentiment-aware task, we propose a solution that fits our problem by borrowing the existing hash join algorithm We call this approach sentimentaware web crawling based on hash join. The proposed bucket-sorted hash join method is faster than the hash join based method in the sentiment-aware task in web crawling.

RELATED WORK

MAIN PROPOSAL

EXPERIMENTAL VALIDATION

EXPERIMENTAL RESULTS

Findings

DISCUSSION

CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Efficient Sentiment-Aware Web Crawling Methods for Constructing Sentiment Dictionary

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Research on Constructing Sentiment Dictionary of Online Course Reviews based on Multi-source Combination
Yang Shuoqiu ... Xu Chaojun
-
Yang Shuoqiu, et. al.Yang Shuoqiu ... Xu Chaojun
19 Jul 2019
19 Jul 2019

Scraping Relevant Images from Web Pages without Download
Erdinç Uzun
ACM Transactions on the Web | VOL. 18
Erdinç UzunErdinç Uzun
11 Oct 2023
ACM Transactions on the Web | VOL. 18

Web performance acceleration by caching rendering results
Hideo Miyahara ... Go Hasegawa
-
Hideo Miyahara, et. al.Hideo Miyahara ... Go Hasegawa
01 Aug 2015
01 Aug 2015

Automatic construction of target-specific sentiment lexicon
Sixing Wu ... Chuhan Wu
Expert Systems With Applications | VOL. 116
Sixing Wu, et. al.Sixing Wu ... Chuhan Wu
13 Sep 2018
Expert Systems With Applications | VOL. 116

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Sentiment-Aware Web Crawling Methods for Constructing Sentiment Dictionary

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access