Abstract

In recent years, the prevailing topic crawler algorithms are concentrated on the contents of topical words. These existing approaches neglect the sematic relationship among textual concepts, which lead to low correlation between crawled webpages. To address the issue, this paper presents a deep analysis of Shark Search algorithm, and makes an optimization in terms of incorporating the characteristics associated with semi-structured webpages. Furthermore, we enhance the performance of vector space model utilized in Shark Search algorithm by virtue of domain ontology, and propose a standardized method based on the vector space of ontology model to improve the evaluation metric of TF-IDF. The experimental results demonstrate the effectiveness of our algorithm that outperforms the state-of-the-art significantly in precision and recall.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call