Abstract

Web genre plays an important role in focused crawling, web link analysis, and contextual advertising. In this paper, web genre is defined as the functional purpose and the information type contained in the website. The intelligent classification of web genre can predict the content and functional type of website. However, there are several critical challenges to solve the web genre classification problem: lack of web genre classification dataset and efficient web genre classification mechanism. To improve web genre classification performance, we crawled Chinese websites of different web genres and converted crawled data into a hierarchical multilabel classification dataset. A website knowledge graph is constructed based on the relationship of website and meta tag features. Using entity features extracted from the knowledge graph, we propose an online web genre classification model based on hierarchical multilabel classification (OWGC-HMC) to mine the functional purpose of the corresponding website. Experimental results show that our OWGC-HMC model can mine hierarchical multilabel structure of web genre and outperform other web genre classification methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call