Abstract

This paper proposes the classification algorithm of news pages based on domain Ontology. In order to improve the shortage of current classification algorithm that only considers the content similarity, this paper presents the semantic classification method which considers both content similarity and structural correlation. Firstly, it parses the Ontology to get Ontology category vector, extracts keywords of news pages texts and drops semantic dimension. At this time, finding out the same vocabulary and ontology category vector in page texts to constitute the text expectation vector, and then calculating the content similarity between ontology category vector and expectation vector of text by using the law of cosines. Secondly, the common vocabularies are mapped to the ontology hierarchy chart, and the structural relevancy is obtained by calculating weighted path of this directed acyclic graph. Finally, it calculates the correlation degree of the news pages and Ontology by combining both, and determines the category of news pages by judging the size relationship between the result and the initial threshold value.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call