Abstract

Large-scale general search engines have been unable to meet the needs of “specialized, sophisticated and deep” information in the field of mineral intelligence services. Vertical search engines have emerged at the historic moment, and the focused crawler is the key technology to achieve vertical search engines. This paper proposes a hybrid topic strategy based on text content and web link structure for the characteristics of mineral information field. In order to improve the topic relevance determination ability of the focused crawler, this article introduces HowNet and the word embedding technology in the field of natural language processing, and combines them to carry out the text-based topic relevance determination; at the same time, it also introduces the HITS algorithm based on the link structure of web pages. The topic strategy based on text content is organically combined with the topic strategy based on the webpage link structure to realize the recognition and prediction of the webpage topic. Simulation experiment results show that the method proposed in this paper can achieve a high recall rate and precision rate for the acquisition of mineral intelligence information on the Internet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call