A focused crawler based on semantic disambiguation vector space model

Wenjun Liu,Xing Liu,Jing Wu,Tiejun Xi,Xiaoping Huang,Yajun Du,Pengjun Jiang,Yu He,Zurui Gan

doi:10.1007/s40747-022-00707-8

Abstract

The focused crawler grabs continuously web pages related to the given topic according to priorities of unvisited hyperlinks. In many previous studies, the focused crawlers predict priorities of unvisited hyperlinks based on the text similarity models. However, the representation terms of the web page ignore the phenomenon of polysemy, and the topic similarity of the text cannot combine the cosine similarity and the semantic similarity effectively. To address these problems, this paper proposes a focused crawler based on semantic disambiguation vector space model (SDVSM). The SDVSM method combines the semantic disambiguation graph (SDG) and the semantic vector space model (SVSM). The SDG is used to remove the ambiguation terms irrelevant to the given topic from representation terms of retrieved web pages. The SVSM is used to calculate the topic similarity of the text by constructing text and topic semantic vectors based on TF × IDF weights of terms and semantic similarities between terms. The experiment results indicate that the SDVSM method can improve the performance of the focused crawler by comparing different evaluation indicators for four focused crawlers. In conclusion, the proposed method can make the focused crawler grab the higher quality and more quantity web pages related to the given topic from the Internet.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A focused crawler based on semantic disambiguation vector space model

Abstract

Talk to us

Similar Papers

More From: Complex & Intelligent Systems

Lead the way for us

Journal: Complex & Intelligent Systems	Publication Date: Jul 5, 2022
License type: open-access

Similar Papers

An improved focused crawler based on Semantic Similarity Vector Space Model
Yajun Du ... Guoli Peng
Applied Soft Computing | VOL. 36
Yajun Du, et. al.Yajun Du ... Guoli Peng
01 Aug 2015
Applied Soft Computing | VOL. 36

A Semantic Aspect-Based Vector Space Model to Identify the Event Evolution Relationship within Topics
Yaoyi Xi ... Bicheng Li
Journal of Computing Science and Engineering | VOL. 9
Yaoyi Xi, et. al.Yaoyi Xi ... Bicheng Li
30 Jun 2015
Journal of Computing Science and Engineering | VOL. 9

A knowledge recommendation approach in design for multi-material 4D printing based on semantic similarity vector space model and case-based reasoning
Saoussen Dimassi ... Jean-Claude André
Computers in Industry | VOL. 145
Saoussen Dimassi, et. al.Saoussen Dimassi ... Jean-Claude André
23 Nov 2022
Computers in Industry | VOL. 145

A distributed semantic similar search for high-dimensional resources in low-dimensional content addressable network
Qingyuan Hu ... Yang Ji
-
Qingyuan Hu, et. al. Qingyuan Hu ... Yang Ji
01 Sep 2013
01 Sep 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A focused crawler based on semantic disambiguation vector space model

Abstract

Talk to us

Similar Papers

More From: Complex & Intelligent Systems