Abstract

Large-scale general search engines have been unable to meet the needs of “specialized, sophisticated and deep” information in the field of mineral intelligence services. Vertical search engines have emerged at the historic moment, and the focused crawler is the key technology to achieve vertical search engines. This paper proposes a hybrid topic strategy based on text content and web link structure for the characteristics of mineral information field. In order to improve the topic relevance determination ability of the focused crawler, this article introduces HowNet and the word embedding technology in the field of natural language processing, and combines them to carry out the text-based topic relevance determination; at the same time, it also introduces the HITS algorithm based on the link structure of web pages. The topic strategy based on text content is organically combined with the topic strategy based on the webpage link structure to realize the recognition and prediction of the webpage topic. Simulation experiment results show that the method proposed in this paper can achieve a high recall rate and precision rate for the acquisition of mineral intelligence information on the Internet.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.